Abstract 1 Introduction 2 Preliminaries 3 Folklore: Symmetric Density Decomposition 4 Universal Refinement Matching Problem 5 Universally Closest Distribution Refinements Problem References

Unraveling Universally Closest Refinements via Symmetric Density Decomposition and Fisher Market Equilibrium

T-H. Hubert Chan ORCID Department of Computer Science, University of Hong Kong, Hong Kong Quan Xue ORCID Department of Computer Science, University of Hong Kong, Hong Kong
Abstract

We investigate the closest distribution refinements problem, which involves a vertex-weighted bipartite graph as input, where the vertex weights on each side sum to 1 and represent a probability distribution. A refinement of one side’s distribution is an edge distribution that corresponds to distributing the weight of each vertex from that side to its incident edges. The objective is to identify a pair of distribution refinements for both sides of the bipartite graph such that the two edge distributions are as close as possible with respect to a specific divergence notion. This problem is a generalization of transportation, in which the special case occurs when the two closest distributions are identical. The problem has recently emerged in the context of composing differentially oblivious mechanisms.

Our main result demonstrates that a universal refinement pair exists, which is simultaneously closest under all divergence notions that satisfy the data processing inequality. Since differential obliviousness can be examined using various divergence notions, such a universally closest refinement pair offers a powerful tool in relation to such applications.

We discover that this pair can be achieved via locally verifiable optimality conditions. Specifically, we observe that it is equivalent to the following problems, which have been traditionally studied in distinct research communities: (1) hypergraph density decomposition, and (2) symmetric Fisher Market equilibrium.

We adopt a symmetric perspective of hypergraph density decomposition, in which hyperedges and nodes play equivalent roles. This symmetric decomposition serves as a tool for deriving precise characterizations of optimal solutions for other problems and enables the application of algorithms from one problem to another. This connection allows existing algorithms for computing or approximating the Fisher market equilibrium to be adapted for all the aforementioned problems. For example, this approach allows the well-known iterative proportional response process to provide approximations for the corresponding problems with multiplicative error in distributed settings, whereas previously, only absolute error had been achieved in these contexts. Our study contributes to the understanding of various problems within a unified framework, which may serve as a foundation for connecting other problems in the future.

Keywords and phrases:
closest distribution refinements, density decomposition, Fisher market equilibrium
Funding:
T-H. Hubert Chan: This research was partially supported by the Hong Kong RGC grants 17201823, 17203122 and 17202121.
Copyright and License:
[Uncaptioned image] © T-H. Hubert Chan and Quan Xue; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Distribution functions
; Theory of computation Mathematical optimization
Related Version:
Full Version: https://arxiv.org/abs/2406.17964 [12]
Editors:
Raghu Meka

1 Introduction

In this work, we explore several existing and new problems under the same unifying framework. By unraveling previously unknown connections between these seemingly unrelated problems, we offer deeper insights into them. Moreover, existing solutions for one problem can readily offer improvements to other problems. Input instances for these problems all have the following form.
Input Instance. An instance is a vertex-weighted bipartite graph with vertex bipartition =(0)(1), edge set and positive111Vertices with zero weights are removed and identified with the empty set . vertex weights w:>0. For some problems, it is convenient to normalize the vertex weights into distributions, i.e., the vertex weights on each side sum to 1; in this case, we call this a distribution instance. We use 𝔹:={0,1} to index the two sides. Given a side ι𝔹, we use ι¯:=ι1 to denote the other side.
Solution Concept. We shall see that in each problem, refinement is the underlying solution concept. Given a side ι𝔹, a refinement222Since vertex weights on one side can be interpreted as a probability distribution, allocating the weight of each vertex to its incident edges can be viewed as a refinement of the original distribution. α(ι) on vertex weights on side ι indicates how each vertex i(ι) distributes its weight w(i) among its incident edges in . In other words, α(ι) is a vector of edge weights in such that for each i(ι), the sum of weights of edges incident on i equals w(i). While some problems may require refinements of vertex weights from one side only, it will still be insightful to consider a refinement pair α=(α(0),α(1)) of vertex weights on both sides.

The main problem we focus on is the closest distribution refinement problem, which we show is intricately related to other problems. In probability theory, a divergence notion quantifies how close two distributions are to each other, where a smaller quantity indicates that they are closer. Common examples of divergences include total variation distance and KL-divergence. Observe that a divergence notion 𝖣 needs not be symmetric, i.e. for two distributions P and Q on the same sample space, it is possible that 𝖣(PQ)𝖣(QP).

Definition 1 (Closest Distribution Refinement Problem).

Given a distribution instance ((0),(1);;w) and a divergence notion 𝖣, the goal is to find a distribution refinement pair α=(α(0),α(1)) such that the distributions α(0) and α(1) on the edges are as close to each other as possible under 𝖣, i.e., 𝖣(α(0)α(1)) is minimized.

Universally Closest Refinement Pair. It is not clear a priori if a refinement pair achieving the minimum for one divergence notion would also achieve the minimum for another divergence notion. For a non-symmetric divergence 𝖣, a pair α=(α(0),α(1)) minimizing 𝖣(α(0)α(1)) may not necessarily achieve the minimum 𝖣(α(1)α(0)). The special case when α consists of a pair of identical refinements is also known as w-transportation or perfect fractional matching [9, 27, 34, 6], in which case α is obviously the minimizer for all divergences.

In general, it would seem ambitious to request a universally closest refinement pair that simultaneously minimizes all “reasonable” divergence notions. In the literature, all commonly used divergence notions satisfy the data processing inequality, which intuitively says that the divergence of two distributions cannot increase after post-processing. Formally, for any deterministic function φ:ΩΩ^, the induced distribution φ(P) and φ(Q) on Ω^ satisfy the inequality: 𝖣(φ(P)φ(Q))𝖣(PQ). Surprisingly, we show that such a universally closest pair exists.

Theorem 2 (Universally Closest Refinement Pair).

Given a distribution instance((0),(1);;w), there exists a distribution refinement pair α=(α(0),α(1)) that minimizes 𝖣(α(0)α(1)) simultaneously for all divergence notions 𝖣 that satisfy the data processing inequality.

Moreover, this pair can be returned using the near-linear time algorithm of Chen et al. [14] for the minimum quadratic-cost flow problem.

Since Definition 1 gives a clean mathematical problem, we first describe the technical contributions and approaches of this work, and we describe its motivation and application in Section 1.2.

While the technical proofs are given in Sections 4 and 5, we give some motivation for why the problem is studied in the first place and sketch the key ideas for our proofs.

1.1 Our Technical Contributions and Approaches

In addition to the problem in Definition  1, we shall see that the following solution properties are also the key to solving other problems as well.
Desirable Solution Properties. Two desirable solution properties are local maximin and proportional response.

  • Local Maximin. A refinement α(ι) on vertex weights on side ι can also be interpreted as how each vertex in (ι) allocates its weight to its neighbors on the other side (ι¯) via the edges in . Specifically, for i(ι) and j(ι¯), α(ι)(ij) is the payload received by j from i, where we rely on the superscript in α(ι) to indicate that the payload is sent from side ι to side ι¯ along an edge in . Consequently, α(ι) induces a payload vector p(ι¯), where p(j)=iα(ι)(ij) is the total payload received by j in the allocation α(ι). Recalling that vertex j also has a weight, the payload density (resulting from α(ι)) of j is ρ(j):=p(j)w(j).

    The refinement α(ι) is locally maximin if for each i(ι), vertex i distributes a non-zero payload α(ι)(ij)>0 to vertex j only if j is among the neighbors of i with minimum payload densities (resulting from α(ι)).

    The intuition of the term “maximin” is that vertex i tries to maximize the minimum payload density among its neighbors by giving its weight to only neighbors that achieve the minimum payload density.

  • Proportional Response. Given a refinement α(ι) on vertex weights on side ι, its proportional response α(ι¯) is a refinement on vertex weights from the other side ι¯. Specifically, vertex j(ι¯) distributes its weight w(j) proportionately according to its received payloads: α(ι¯)(ij)=α(ι)(ij)p(j)w(j).

Several problems in the literature studied in different contexts can be expressed with the same goal.

Goal 3 (High-Level Goal).

Given an instance ((0),(1);;w), find a refinement pair α=(α(0),α(1)) such that both refinements are locally maximin and proportional response to each other.

This abstract problem have been studied extensively in the literature under different contexts, and various algorithms to exactly achieve or approximate (whose meaning we will explain) Goal 3 have been designed. One contribution of this work is to unify all these problems under our abstract framework. Consequently, we readily offer new approximation results for some problems that are unknown before this connection is discovered.

We next describe how our abstract framework is related to each considered problem and what new insights or results are achieved, which also provide the tools to solve or approximate the problem in Definition 1.

1.1.1 Hypergraph Density Decomposition

An instance ((0),(1);;w) is interpreted as a hypergraph H. Each element in i(0) is a hyperedge and each element in j(1) is a node, where {i,j} means that node j is contained in hyperedge i. Observe that in this hypergraph, both the hyperedges and the nodes are weighted.

Densest Subset Problem.

Given a subset S(1) of nodes in a (weighted) hypergraph, its density is ρ(S):=w(H[S])w(S), where H[S] is the collection of hyperedges containing only nodes inside S. The densest subset problem requests a subset S with the maximum density.

Density Decomposition.

The decomposition can be described by an iterative process that generates a density vector ρ(1) for the nodes in the hypergraph by repeated peeling off densest subsets. First, the (unique) maximal densest subset S1 is obtained, and each jS1 will be assigned the density ρ(j)=w(H[S1])w(S1) of S1. Then, the hyperedges H[S1] and the subset S1 are removed from the instance; note that any hyperedge containing a node outside S1 will remain. The process is applied to the remaining instance until eventually every node in (1) is removed and assigned its density. The collection of subsets found in the iterative process forms the density decomposition (1)=k1Sk.

As mentioned in [22], the earliest variant of this decomposition was introduced by Fujishige [19] in the context of polymatroids. We describe the related work in more detail in Section 1.3.

Connection with the Local Maximin Condition.

Charikar [13] gave an LP (that can be easily generalized to weighted hypergraphs) which solves the densest subset problem exactly, and a refinement α(0) on the weights of hyperedges (0) corresponds exactly to a feasible solution of the dual LP. Replacing the linear function in this dual LP with a quadratic function while keeping the same feasible variable α(0), Danisch et al. [15] considered a quadratic program (that has also appeared in [19]). They showed that a refinement corresponds to an optimal solution to the quadratic program iff it is locally maximin333Instead of using the term “maximin”, the same concept is called “stability” in [15].. Moreover, a locally maximin refinement also induces the aforementioned density vector ρ.

As observed in [22], the quadratic program can be expressed as a minimum quadratic-cost flow problem, which can be solved optimally using the nearly-linear time algorithm of Chen et al. [14]. As we shall show that our considered problems are all equivalent to Goal 3, this implies the algorithmic result in Theorem 2.

Symmetry of Density Decomposition.

We observe that the density decomposition is symmetric between the hyperedges and the nodes. Recall that in iteration k of the density decomposition, a subset Sk is produced, together with its corresponding collection S^k of hyperedges that determines the density w(S^k)w(Sk). Therefore, in addition to a partition of nodes, there is a corresponding decomposition of the hyperedges (0)=k1S^k.

Given an abstract instance ((0),(1);;w), one could have interpreted it differently with (0) as nodes and (1) as hyperedges. When the density decomposition is constructed with the roles reversed, the exact sequence {(S^k,Sk)}k of pairs is obtained, but in reversed order. Intuitively, (S^1,S1) is the densest from the perspective of S1(1), but it is the least dense from the perspective of S^1(0).

The symmetric nature of the density decomposition has been mentioned recently by Harb et al. [23, Theorem 13], but we shall explain later this fact may be inferred when different earlier works are viewed together. Hence, this observation may be considered as part of the folklore. Since we need some crucial properties of the symmetric density decomposition to prove Theorem 2, for the sake of completeness, the connection of Goal 3 and this folklore result is described in Section 3.

Approximation Notions.

Even though the exact density decomposition can be achieved via LP or maximum flow, it can be computationally expensive in practice. In the literature, efficient approximation algorithms have been considered. While there are different attempts to define approximation notions for density decompositions, it is less clear what they mean when they are applied to different problems.

Observe that a refinement pair α=(α(0),α(1)) achieving Goal 3 may not be unique, but the induced payload vectors (p(0),p(1)) and density vectors (ρ(0),ρ(1)) are unique. When we say approximating Goal 3, we mean finding a refinement pair such that the induced payload and density vectors have some approximation guarantees. We shall see that approximation guarantees444See Definition 12 for approximation notions of muliplicative vs absolute errors on vectors. on these vectors can be translated readily to approximation notions for various problems.

1.1.2 Unraveling Universally Closest Distribution Refinements via Symmetric Density Decomposition

Restricted Class of Divergence.

As a first step towards proving Theorem 2, we consider a restricted class of divergences known as hockey-stick divergence. For each γ0,

𝖣γ(PQ):=supSΩQ(S)γP(S).

The hockey-stick divergence is related to the well-known (ϵ,δ)-differential privacy inequality. Specifically, 𝖣eϵ(PQ)δ iff for all subsets SΩ,

Q(S)eϵP(S)+δ.

We first consider the universal closest distribution refinements with respect to the class of hockey-stick divergences 𝒟HS:={𝖣γ:γ0}. In Lemma 31, we show that the restricted variant of Theorem 2 to the class 𝒟HS is equivalent to the following problem.

Universal Refinement Matching Problem.

Given an instance G=((0),(1);;w) and two non-negative weights c=(c(0),c(1))𝔹, we use G(c) to denote the instance of (fractional) maximum matching on the bipartite graph ((0),(1);) with vertex capacity constraints. For side ι and i(ι), the capacity constraint at vertex i is c(ι)w(i). A fractional solution consists of assigning non-negative weights to the edges in such that the weighted degree of every vertex does not exceed its capacity constraint. Observe that for any refinement pair α=(α(0),α(1)), the fractional matching cα defined below is a feasible solution to G(c):

cα(ij):=min{c(0)α(0)(ij),c(1)α(1)(ij)} for all {i,j}.

The goal is to find a refinement pair α=(α(0),α(1)) such that for any weight pair c, the edge weights cα form an optimal (fractional) solution to the maximum matching instance G(c).

The structure of the symmetric density decomposition described in Section 1.1.1 makes it convenient to apply a primal-dual analysis to achieve the following in Section 4.

Theorem 4 (Universal Maximum Refinement Matching).

Given an instance ((0),(1);;w), suppose a distribution refinement pair α=(α(0),α(1)) satisfies Goal 3. Then, for any c=(c(0),c(1)), the (fractional) solution cα is optimal for the maximum matching instance G(c).

Extension to General Case.

To extend Theorem 2 beyond the restricted divergence class 𝒟HS, we consider the closest distribution refinement problem under a more general probability distance notion.
Capturing All Data Processing Divergences via Power Functions. As opposed to using a single number by a divergence notion to measure the closeness of two distributions, the power function 𝖯𝗈𝗐(PQ):[0,1][0,1] between distributions P and Q captures enough information to recover all divergence notions that satisfy the data processing inequality. We will give a formal explanation in Section 5. A main result is given in Theorem 37, which states that any distribution refinement pair satisfying Goal 3 achieves the minimal power function, from which Theorem 2 follows immediately.

Approximation Results.

In the full version [12], we show that a refinement that induces a payload with multiplicative error can lead to approximation guarantees for other problems. For the universal refinement matching problem, a natural approximation notion is based on the standard approximation ratio for the maximum matching problem.

On the other hand, achieving a meaningful approximation notion for the universally closest distribution refinement problem in terms of divergence values is more difficult. The reason is that any strictly increasing function on a data processing divergence still satisfies the data processing inequality. Therefore, any non-zero deviation from the correct answer may be magnified arbitrarily. Instead of giving approximation guarantees directly on the divergence values, we consider a technical notion of power function stretching, and the approximation quality of a refinement pair is quantified by how much the optimal power function is stretched. The formal statement is given in the full version [12].

1.1.3 Achieving Multiplicative Error via Proportional Response in Fisher Markets

We next interpret an instance ((0),(1);;w) as a special symmetric case of linear Fisher markets [18], which is itself a special case of the more general Arrow-Debreu markets [2]. This allows us to transfer any algorithmic results for Fisher/Arrow-Debreu market equilibrium to counterparts for Goal 3. Specifically, we can use a quick iterative procedure proportional response [39, 40] to approximate the densities of vertices in the density decomposition in Section 1.1.1 with multiplicative error, where only absolute error has been achieved before [15, 22]. In addition to being interesting in its own right, multiplicative error is also used to achieve the approximation results in Section 1.1.2.

Market Interpretation.

Even though in an instance, the roles of the two sides are symmetric, it will be intuitive to think of (0) as buyers and (1) as sellers. For buyer i(0), the weight w(i) is its budget. In the symmetric variant, the utilities of buyers have a special form. Each seller j(1) has one divisible unit of good j for which a buyer i has a positive utility exactly when {i,j}, in which case the utility per unit of good j is w(j) (that is common for all buyers that have a positive utility for j). For a general Fisher market, a buyer i can have an arbitrary utility wi(j) for one unit of good j.
Market Allocation and Equilibrium. For the symmetric variant, an allocation can be represented by a refinement pair α=(α(0),α(1)), in which α(1) must be a proportional response to α(0). For the refinement α(0) on the buyers’ budgets, the induced payload vector p(1) on the sellers corresponds to the price vector on goods. For the refinement α(1) on the sellers’ weights, the induced payload vector p(0) on the buyers corresponds to the utilities received by the buyers. It is well-known (explained in the full version [12]) that a market equilibrium is characterized by a locally maximin refinement α(0) of buyers’ budgets together with its proportional response α(1). From our theory of symmetric density decomposition in Section 1.1.1, this is equivalent to Goal 3.

Alternative Characterization.

The fact that the local maximin condition on the buyers’ budget allocation is equivalent to Fisher market equilibrium readily allows us to borrow any algorithmic results for market equilibrium to solve or approximate Goal 3. In the full version [12], we give details on the following observations.

  • In linear Fisher markets (not necessarily symmetric), we show that the local maximin condition on sellers’ goods allocation to buyers can be used to characterize a market equilibrium. This is an alternative characterization of linear Fisher market equilibrium that involves only goods allocation and buyers’ utilities, but not involving good prices.

  • On the other hand, we show that there exists an Arrow-Debreu market such that there is a goods allocation satisfying the local maximin condition, but it is not an equilibrium.

1.2 Motivation and Application for Universally Closest Distribution Refinements

The problem in Definition 26 arises in the study of the composition of differentially oblivious mechanisms [41, 42], which is a generalization of the more well-known concept of differential privacy (DP) [17].

A Quick Introduction to Differential Privacy/Obliviousness.

Consider a (randomized) mechanism that takes an input from space 𝒳 and returns an output from space 𝒴 that will be passed to a second mechanism. During the computation of , an adversary can observe some view from space 𝒱 that might not necessarily include the output in general. Suppose at first we just consider the privacy/obliviousness of on its own, which measures how much information the adversary can learn about the input from its view. Intuitively, it should be difficult for the adversary to distinguish between two similar inputs from their corresponding views produced by the mechanism. Formally, similar inputs in 𝒳 are captured by a neighboring symmetric relation, and the privacy/obliviousness of the mechanism requires that the two distributions of views produced by neighboring inputs are “close”, where closeness can be quantified by some notion of divergence. As mentioned in Section 1.1.2, the popular notion of (ϵ,δ)-DP is captured using hockey-stick divergences.

Composition of Differentially Oblivious Mechanisms.

Suppose the output of mechanism  from space 𝒴 is passed as an input to a second mechanism , which is differentially oblivious with respect to some neighboring relation on 𝒴. Hence, given input x to , one should consider the joint distribution (x) on the space 𝒱×𝒴 of views and outputs.

The notion of neighbor-preserving differential obliviousness (NPDO) has been introduced [41, 42] to give sufficient conditions on that allows composition with such a mechanism so that the overall composition preserves differential obliviousness.

Specifically, given neighboring inputs x0x1, conditions are specified on the two distributions (x0) and (x1). Unlike the case for differential privacy/obliviousness, it is not appropriate to just require that the two distributions are close. This is because the adversary can only observe the view component of 𝒱×𝒴. Moreover, one also needs to incorporate the neighboring relation on the output space 𝒴, which is needed for the privacy guarantee for the second mechanism .

The definition of NPDO is more conveniently described via the terminology offered by our abstract instance ((0),(1);;w). Here, the sets (0) and (1) are two copies of 𝒱×𝒴. For each ι𝔹, the weights w on (ι) corresponds to the distribution (xι). Finally, (v0,y0)(0) and (v1,y1)(1) are neighbors in iff v0=v1 and y0y1 are neighboring in 𝒴.

In the original definition [42], the mechanism  is said to be d-NPDO with respect to divergence 𝖣 if for the constructed instance ((0),(1);;w), there exists a distribution refinement pair α=(α(0),α(1)), which may depend on the divergence 𝖣, such that 𝖣(α(0)α(1))d. Examples of considered divergences include the standard hockey-stick divergence and Rényi divergence [31].

Since it only makes sense to consider divergences satisfying the data processing inequality in the privacy context, the universally closest refinement pair guaranteed in Theorem 2 makes the notion of NPDO conceptually simpler, because the closest refinement pair does not need to specifically depend on the considered divergence.

We believe that this simplification will be a very useful tool. In fact, to achieve advanced composition in [42], a very complicated argument has been used to prove the following fact, which is just an easy corollary of Theorem 2.

Fact 5 (Lemma B.1 in [42]).

Given a distribution instance ((0),(1);;w), suppose there exist two refinement pairs αs=(αs(0),αs(1)) and αt=(αt(0),αt(1)) such that the hockey-stick divergences 𝖣γ(αs(0)αs(1)) and 𝖣γ(αt(1)αt(0)) are both at most δ.

Then, there exists a single refinement pair α=(α(0),α(1)) such that both 𝖣γ(α(0)α(1)) and 𝖣γ(α(1)α(0)) are at most δ.

1.3 Related Work

The densest subset problem is a classical problem in combinatorial optimization with numerous real-world applications in data mining, network analysis, and machine learning [21, 13, 26, 30, 1, 35, 29], and it is the basic subroutine in defining the density decomposition.

Density Decomposition.

As mentioned in Section 1.1.1, the earliest reference to the density decomposition and its associated quadratic program were introduced by Fujishige [19]. Even though this decomposition was defined for the more general submodular or supermodular case, the procedure has been rediscovered independently many times in the literature for the special bipartite graph case (or equivalently its hypergraph interpretation).

Earlier we described the decomposition procedure by repeatedly peeling off maximal densest subsets in a hypergraph. This was the approach by Tatti and Gionis [37, 36] and Danisch et al. [15] . Moreover, the same procedure has been used to define hypergraph Laplacians in the context of the spectral properties of hypergraphs [11].

In the literature, there is another line of research that defines the decomposition by repeatedly peeling off least densest subsets instead. However, given a subset SV of nodes in a weighted hypergraph H=(V,E), a different notion of density is considered: 𝔯(S):=w(E(S))w(S), where E(S) is the collection of hyperedges that have non-empty intersection with S. The difference is that before we consider E[S] that is the collection of hyperedges totally contained inside S. Then, an alternative procedure can be defined by peeling off a maximal least densest subset under 𝔯 and performing recursion on the remaining graph.

This was the approach used by Goel et al. [20] to consider the communication and streaming complexity of maximum bipartite matching. The same procedure has been used by Bernstein et al. [7] to analyze the replacement cost of online maximum bipartite matching maintenance. Lee and Singla [28] also used this decomposition to investigate the the batch arrival model of online bipartite matching. Bansal and Cohen [4] have independently discovered this decomposition procedure to consider maximin allocation of hyperedge weights to incident nodes.

Symmetric Nature of Density Decomposition.

At this point, the reader may ask how we know that the two procedures will give the same decomposition (but in reversed order). The answer is that exactly the same quadratic program has been used in [15] and [7] to analyze the corresponding approaches. This observation has also been made recently by Harb et al. [23, Theorem 13]. It is not too difficult to see that the least densest subset using the modified density 𝔯 is equivalent to the densest subset with the roles of vertices and hyperedges reversed. Therefore, Theorem 20 can also be obtained indirectly through these observations.

Connection between Density Decomposition and Fisher Market Equilibrium.

Even before this work in which we highlight the local maximin condition, density decomposition has already surreptitiously appeared in algorithms computing Fisher market equilibrium. Jain and Vazirani [24] considered submodular utility allocation (SUA) markets, which generalize linear Fisher markets. Their procedure [24, Algorithm 9] for computing an SUA market equilibrium can be viewed as a masqueraded variant of the density decomposition by Fujishige [19]. Buchbinder et al. [10] used a variant of SUA market equilibrium to analyze online maintenance of matroid intersections, where equilibrium prices are also obtained by the density decomposition procedure.

In this work, we observe conversely that algorithms for approximating market equilibrium can also be directly used for approximating density decompositions. For the simple special case in which the input graph contains a perfect fractional matching, Motwani et al. [32] showed that several randomized balls-into-bins procedures attempting to maintain the locally maximin condition can achieve (1+ϵ)-approximation.

Distributed Iterative Approximation Algorithms.

Even though efficient exact algorithms are known to achieve Goal 3, distributed iterative approximation algorithms have been considered for practical scenarios.

Density Decomposition.

By applying the first-order iterative approach of Frank-Wolfe to the quadratic program, Danisch et al. [15] showed that T iterations, each of which takes O(||) work and may be parallelized, can achieve an absolute error of O(1T) on the density vector ρ. Harb et al. [23] showed that the method Greedy++ can be viewed as a noisy variant of Frank-Wolfe and hence has a similar convergence result.

By using a first-order method augmented with Nesterov momentum [33] such as accelerated FISTA [5], Elfarouk et al. [22] readily concluded that T iterations can achieve an absolute error of O(1T). However, observe that if there is a vertex i such that ρ(i)ϵ, an absolute error of ϵ may produce an estimate ρ(i) that is arbitrarily small.

Inspired by estimates of ρ, several notions of approximate density decompositions have been considered in previous works [37, 15, 22]. However, it is not clear if there is any meaningful interpretations of these approximate decompositions in the context of closest distribution refinements or market equilibrium.
Fisher Market Equilibrium. Market dynamics have been considered to investigate how agents’ mutual interaction can lead to convergence to a market equilibrium. The proportional response (PR, aka tit-for-tat) protocol has been used by Wu and Zhang [39] to analyze bandwidth trading in a peer-to-peer network G=(V,E) with vertex weights w. This can be seen as a special case of our instance ((0),(1);;w), where (0) and (1) are two copies of V, and i(0) and j(1) are neighbors in iff the corresponding pair {i,j}E are neighbors in G. Interestingly, they consider a bottleneck decomposition of the graph G which can be recovered from our symmetric density decomposition. They showed that the PR protocol converges to some equilibrium under this special model.

Subsequently, Zhang [40] showed that the PR protocol can be applied in Fisher markets between buyers and sellers, and analyzed the convergence rate to the market equilibrium in terms of multiplicative error. Birnbaum et al. [8] later improved the analysis of the PR protocol by demonstrating that it can be viewed as a generalized gradient descent algorithm.

1.4 Potential Impacts and Paper Organization

This work unifies diverse problems under a single framework, revealing connections and providing new approximation results. Future research directions are abundant. Examining the implications of various market models or equilibrium concepts on the densest subset problem and density decomposition opens the door for further exploration of the relationships between economic markets and combinatorial optimization problems. This line of inquiry could also lead to new insights into economics, particularly regarding market participants’ behavior and the efficiency of different market mechanisms.
Paper Organization. In Section 2, we introduce the precise notation that will be used throughout the paper. In Section 3, we summarize the structural results regarding the symmetric density decomposition, which is used as a technical tool to analyze other problems. As mentioned in the introduction, Section 4 investigates the universal refinement matching problem which is the precursor for tackling the more challenging universally closest distribution refinements problem in Section 5. In Section 5.1, we first consider the special case of hockey-stick divergences; by considering power functions, we prove Theorem 2 in Section 5.2.
Additional Materials in the Full Version. In the full version [12], we present approximation results for the universal closest refinements problem by defining a stretching operation on power functions. Moreover, we recap the connection between the symmetric density decomposition and the symmetric Fisher market equilibrium. We give an alternative characterization of the Fisher market equilibrium in terms of the local maximin condition on sellers’ goods allocation to buyers. Furthermore, we give details on the convergence rates of how the iterative proportional response protocol can achieve an approximation with multiplicative error for Goal 3.

2 Preliminaries

We shall consider several problems, whose inputs are in the form of vertex-weighted bipartite graphs.

Definition 6 (Input Instance).

An instance ((0),(1);;w) of the input consists of the following:

  • Two disjoint vertex sets (0) and (1).

    For the side ι𝔹:={0,1}, we use ι¯:=ι1 to indicate the other side different from ι.

  • A collection of unordered pairs of vertices that is interpreted as a bipartite graph between vertex sets (0) and (1); in other words, if f={i,j}, then exactly one element of f is in (0) and the other is in (1). In this case, we also denote ij or just ij.

    A vertex is isolated if it has no neighbor.

  • Positive555Vertices with zero weight do not play any role and may be excluded. vertex weights w:(0)(1)>0; for a vertex i, we use wi or w(i) to denote its weight.

    For ι𝔹, we also use w(ι)(ι) to denote the weight vector for each side.

This notation highlights the symmetry between the two sets (0) and (1).

 Remark 7.

In this work, we focus on the case that (0) and (1) are finite. Even though it is possible to extend the results to continuous and uncountable sets (which are relevant when those sets are interpreted as continuous probability spaces), the mathematical notations and tools involved are outside the scope of the typical computer science community.

Definition 8 (Allocation Refinement and Payload).

Given an input instance as in Definition 6, for side ι𝔹, an allocation refinement (or simply refinement) α(ι)0 of w(ι)(ι) can be interpreted as edge weights on  such that for each non-isolated vertex i(ι),

w(ι)(i)=f:ifα(ι)(f).

In other words, if a vertex i has at least one neighbor, it distributes its weight wi among its neighboring edges {i,j} such that ij. These edge weights due to the refinement α(ι) can be interpreted as received by vertices on the other side (ι¯); this induces a payload vector p(ι¯)(ι¯) defined by:

j(ι¯),p(ι¯)(j):=f:jfα(ι)(f).

With respect to the weights of vertices in (ι¯), the (payload) density ρ(ι¯)(ι¯) is defined by:

ρ(ι¯)(j):=p(ι¯)(j)w(j).

When there is no risk of ambiguity, the superscripts (ι) and (ι¯) may be omitted.

Definition 9 (Proportional Response).

Suppose for side ι𝔹, α(ι) is a refinement of the vertex weights on (ι). Then, a proportional response to α(ι) is a refinement α(ι¯) of the vertex weights on the other side (ι¯) satisfying the following:

  • If j(ι¯) receives a positive payload p(ι¯)(j)>0 from α(ι), then for each ij,

    α(ι¯)(ij)=α(ι)(ij)p(ι¯)(j)w(j)=α(ι)(ij)ρ(ι¯)(j).

  • If a non-isolated vertex j(ι¯) receives zero payload from α(ι), then in α(ι¯), vertex j may distributes its weight w(j) arbitrarily among its incident edges in .

Definition 10 (Local Maximin Condition).

Given an input instance as in Definition 6, for ι𝔹, suppose a refinement α(ι)+ of w(ι)(ι) induces a payload density vector ρ(ι¯)(ι¯) as in Definition 8. Then, α(ι) is locally maximin if,

i(ι),α(ι)(ij)>0jargmin(ι¯):iρ(ι¯)().

In other words, the local maximin condition states that each vertex i(ι) attempts to send a positive weight to a neighbor ji only if j achieves the minimum ρ(ι¯)() among the neighbors  of i.

 Remark 11.

Observe that if α(ι) satisfies the local maximin condition, then every non-isolated vertex in (ι¯) must receive a positive payload. This means that its proportional response α(ι¯) is uniquely determined.

Definition 12 (Notions of Error).

Given a vector ρ, we consider two notions of error for an approximation vector ρ^.

  • Absolute error ϵ with respect to norm . We have: ρρ^ϵ.

    A common example is the standard 2-norm.

  • Multiplicative error ϵ. For all coordinates i, |ρ(i)ρ^(i)|ϵ|ρ(i)|.

3 Folklore: Symmetric Density Decomposition

An instance ((0),(1);;w), as in Definition 6, may be interpreted as a hypergraph H. We have the vertex set V=(1) (aka the ground set) and the hyperedge set E=(0) (aka the auxiliary set), where vV and eE are neighbors in iffthe hyperedge e contains the vertex v. Moreover, w assigns weights to both V and E in the hypergraph.

Notation: Ground and Auxiliary Sets.

For side ι𝔹, the open neighborhood of a non-empty subset S(ι) consists of vertices on the other side that have at least one neighbor in S, i.e., (S):={j(ι¯):iS,ij}. Its closed neighborhood is [S]:={j(S):({j})S}, i.e., a vertex j is in [S] iff it has at least one neighbor and all its neighbors are contained in S. Its density is defined as ρ(ι)(S):=w([S])w(S), where the superscript ι may be omitted when the context is clear. When we consider the densities of subsets in (ι), the set (ι) plays the role of the ground set and the set (ι¯) plays the role of the auxiliary set. Observe that this definition is consistent with the hypergraph (V,H) interpretation when ι=1 plays the role of the vertex set V.
Extension to Empty Set. In some of our applications, it makes sense to have isolated vertices in the input instance. For (ι), we extend ()=[]:={j(ι¯):i(ι),j≁i}. For the purpose of defining ρ(), we use the convention that 00=0 and x0=+ for x>0.

Definition 13 (Densest Subset).

When (ι) plays the role of the ground set, a subset S(ι) is a densest subset, if it attains maxS(ι)ρ(ι)(S).

The following well-known fact characterizes a property of maximal densest subset (see [3, Lemma 4.1] for a proof).

Fact 14.

With respect to set inclusion, the maximal densest subset in (ι) is unique and contains all densest subsets.

Definition 15 (Sub-Instance).

Given an instance ((0),(1);;w), the subsets S(0)(0) and S(1)(1) naturally induces a sub-instance (S(0),S(1);;w), where and w are restrictions of and w, respectively, to S(0)S(1).

The following definition is essentially the same as the density decomposition of a hypergraph [37, 36].

Definition 16 (Density Decomposition with Ground Set).

Given an instance ((0),(1);;w), suppose for ι𝔹, we consider (ι) as the ground set. The density decomposition with (ι) as the ground set is a sequence of pairs {(S(0),S(1))}1, together with a density vector ρ(ι):(ι)+, defined by the following iterative process.

Initially, we set G0:=((0),(1);;w) to be the original given instance, and initialize =1.

  1. 1.

    If G1 is an instance where the vertex sets on both sides are empty, the process stops.

  2. 2.

    Otherwise, let S(ι) be the maximal densest subset in G1 with side ι being the ground set, where ρ is the corresponding density; let S(ι¯):=[S(ι)] be its closed neighborhood in G1.

    For each uS(ι), define ρ(ι)(u):=ρ=w(S(ι¯))w(S(ι)).

  3. 3.

    Let G be the sub-instance induced by removing S(ι) and S(ι¯) from G1; note that edges in incident to removed vertices are also removed.

    Increment and go to the next iteration.

 Remark 17 (Isolated Vertices).

Observe that if each side ι has isolated vertices U(ι) that have no neighbors, then they will also appear at the beginning and the end of the decomposition sequence as (U(0),) and (,U(1)).

When (ι) is the ground set, the density ρ(ι) of vertices in U(ι) is zero. In the rest of this section, we may assume that isolated vertices are removed from the instance.

The following fact has been proved in [15] when the vertices in the ground set have uniform weights. The same proof generalizes to the case where both sides have general positive weights.

Fact 18 (Local Maximin Condition Gives Density Decomposition).

Given an instance ((0),(1);;w), consider side ι𝔹 as the ground set. Suppose α(ι¯) is a refinement of the vertex weights w(ι¯) on side ι¯, which induces the payload density vector ρ(ι) on side ι as in Definition 8.

Suppose further that α(ι¯) satisfies the local maximin condition in Definition 10. Then, ρ(ι) coincides exactly with the density vector ρ(ι) achieved by the density decomposition in Definition 16 with side ι being the ground set.

Moreover, if (S(ι¯),S(ι)) is a pair in the density decomposition, then S(ι¯) is exactly the collection of vertices that send positive payload to S(ι) in α(ι¯).

Symmetry in Density Decomposition.

In Definition 16, one side ι𝔹 plays the role of the ground set. If we switch the ground set to the other side ι¯, the same sequence {(S(0),S(1))}1 will be produced in the density decomposition, but in the reversed order. While a similar observation has been made by Harb et al. [23, Theorem 13], we re-establish this fact by considering the proportional response to a locally maximin refinement in the following lemma, whose proof is given in the full version [12].

Lemma 19 (Proportional Response to Locally Maximin Refinement).

Let α(0) be a refinement of vertex weights in (0) that satisfies the local maximin condition. By Remark 11, suppose the refinement α(1) on vertex weights in (1) is the unique proportional response to α(0). Moreover, for each side ι𝔹, the refinement α(ι) induces the payload density vector ρ(ι¯) on vertices on the other side ι¯. Then, the following holds:

  • For any i(0) and j(1) such that ij and α(0)(ij)>0, their received payload densities due to the refinements satisfy ρ(0)(i)=1ρ(1)(j).

  • The refinement α(1) also satisfies the local maximin condition.

  • We get back α(0) as the proportional response to α(1).

In addition to being used to establish the symmetry of the density decomposition in Theorem 20, Lemma 19 will be used later as a technical tool for analyzing the universal refinement matching problem in Section 4.

Theorem 20 (Folklore: Symmetry of Density Decomposition).

Given an instance ((0),(1);;w), suppose the sequence {(S(0),S(1))}1 of pairs is the density decomposition using (1) as the ground set.

Then, the density decomposition using (0) as the ground set produces the same sequence of pairs in reversed order.

4 Universal Refinement Matching Problem

As mentioned in Section 1, an instance in Definition 6 can be interpreted as inputs in various problems. As a precursor to the universally closest distribution refinements problem studied in Section 5, we consider a matching problem. Here, an instance is interpreted as the input of the vertex-constrained fractional matching problem, where the vertex constraints on each side is scaled by the same multiplicative factor.

Definition 21 (Vertex-Constrained Matching Instance).

Given an instance G=((0),(1);;w) as in Definition 6 and two non-negative weights c=(c(0),c(1))𝔹, we use G(c) to denote the instance of (fractional) matching on the bipartite graph ((0),(1);) with vertex capacity constraints. For side ι and i(ι), the capacity constraint at vertex i is c(ι)w(i).

LP Interpretation.

The maximum (fractional) matching problem instance G(c) is captured by the following linear program.

𝖫𝖯(G(c))maxijxij
s.t.j:ijxijc(0)w(i), i(0)
i:ijxijc(1)w(j), j(1)
xij0, i(0)j(1)

We will also consider the dual in our analysis:

𝖣𝖯(G(c))minc(0)i(0)w(i)λi +c(1)j(1)w(j)λj
s.t.λi+λj1, i(0)j(1)
λi0, i(0)(1)
Definition 22 (Fractional Matching Induced by Refinement Pair).

Given an instance G=((0),(1);;w) and a non-negative weight pair c=(c(0),c(1)), a refinement pair α=(α(0),α(1)), where α(ι) is a refinement on vertex weights in (ι), induces a feasible fractional matching for G(c) that is denoted as follows:

cα(ij):=min{c(0)α(0)(ij),c(1)α(1)(ij)}, for all ij.

 Remark 23.

The feasibility of the fractional matching cα to the instance G(c) follows directly because each α(ι) is a refinement of the vertex weights in (ι).

Definition 24 (Universal Refinement Pair for Matching).

Given an instance G=((0),(1);;w) and τ[0,1], a refinement pair α is (1τ)-universal for matching in G, if for any non-negative weight pair c, the solution cα achieves (1τ)-approximation for the maximum matching problem on G(c).

For τ=0, we simply say α is universal (for matching in G).

Intuition for Universal Refinement Pair.

Besides being a technical tool for later sections, the notion of a universal refinement pair is interesting in terms of understanding the structure of vertex-constrained bipartite matching. Suppose it is known that the vertex weights on each side are correlated and may vary but follow some fixed relative ratios among themselves. The idea of a universal refinement pair is that some pre-computation can be performed such that when the vertex weights on each side change proportionally, the matching can be updated quickly to get a (nearly) optimal solution without computing from scratch.

Observe that the density decomposition in Definition 16 does not change if the vertex weights on one side are multiplied by the same scalar. Informally, the decomposition identifies different levels of “congestion” when a matching is performed between vertices on the two sides. The following result formalizes the connection between the density decomposition and universal refinement pairs.

Lemma 25 (Local Maximin Achieves Universal for Matching).

Suppose in an instance G, for side ι𝔹, the refinement α(ι) on vertex weights in (ι) is locally maximin, and the refinement α(ι¯) is the (unique) proportional response to α(ι). Then, the refinement pair (α(ι),α(ι¯)) is universal for matching in G.

Moreover, suppose in the density decomposition, for side ι, ρ(ι) is the payload density vector and p(ι) is the payload vector. Then, for any c=(c(ι),c(ι¯)), the weight of the maximum matching in G(c) is:

j(ι)min{c(ι)ρ(ι)(j),c(ι¯)}p(ι)(j).

Proof.

By Lemma 19, α(ι) and α(ι¯) are proportional responses to each other and both are locally maximin. Fix some c=(c(0),c(1)) and we show that cα is an optimal solution to G(c).

Let {(S(0),S(1)}1 be the density decomposition with (1) being the ground set. (Recall that the density decomposition is symmetric between (0) and (1), where the forward or backward order of the sequence depends on which side is the ground set.)

From Fact 18, recall that α(0)(ij) and α(1)(ij) are non-zero only if there exists some such that (i,j):=(S(0)×S(1)).

Let L(1):={:c(1)w(S(1))c(0)w(S(0))} and L(0):={:c(1)w(S(1))>c(0)w(S(0))}.

Observe that for iS(1), ρ(1)(i)=w(S(0))w(S(1)). Therefore, for all L(1) and L(0), <.

Moreover, for all <, for all iS(0) and jS(1), i and j are not neighbors in , from the definition of the density decomposition.

Hence, we can construct a feasible dual solution λ to 𝖣𝖯(G(c)) as follows.

  • For L(1): for iS(0), λi=0; for jS(1), λj=1.

  • For L(0): for iS(0), λi=1; for jS(1), λj=0.

The dual feasibility of λ holds because if i(0) and j(1) such that λi=λj=0, then it must be the case that i and j are not neighbors in .

Next, we check that the dual solution λ has the same objective value as the primal solution cα to 𝖫𝖯(Gc).

Observe that for L(1), for iS(0)jS(1), we have

c(1)α(1)(ij)=c(1)α(0)(ij)w(S(1))w(S(0))c(0)α(0)(ij).

Hence, c(1)jS(1)w(j)λj=c(1)(i,j)α(1)(ij)=(i,j)cα(ij), where the first equality holds because α(1) restricted to is a refinement of vertex weights in S(1).

Similarly, for L(0), we have c(0)iS(0)w(i)λi=(i,j)cα(ij).

Therefore, the feasible primal solution cα and the feasible dual solution λ have the same objective value, which implies that both are optimal.

Finally, we compute the weight of the maximum matching in G(c). For simpler notation, it is understood that in each sum below, we include only pairs (i,j) such that both α(0)(ij) and α(1)(ij) are non-zero.

(i,j)((ι¯)×(ι)):α(ij)>0min{c(ι¯)α(ι¯)(ij),c(ι)α(ι)(ij)}
= (i,j)min{c(ι¯),c(ι)α(ι)(ij)α(ι¯)(ij)}α(ι¯)(ij)
= (i,j)min{c(ι¯),c(ι)1ρ(ι)(j)}α(ι¯)(ij)
= j(ι)min{c(ι¯),c(ι)ρ(ι)(j)}i:ijα(ι¯)(ij)
= j(ι)min{c(ι)ρ(ι)(j),c(ι¯)}p(ι)(j).

5 Universally Closest Distribution Refinements Problem

In this section, we show that when an instance is interpreted as two distributions on two sample spaces imposed with a neighboring relation, a refinement pair satisfying Goal 3 is an optimal solution to the universally closest distribution refinements problem.

Distribution Instance.

In Definition 6, an input is a distribution instance, if, in addition, we require that for each side ι𝔹, the vertex weights satisfies i(ι)w(ι)(i)=1, i.e., w(ι) is a probability distribution on (ι).

Observe that given a distribution instance G=((0),(1);;w), for each side ι𝔹, a refinement α(ι) of the vertex weights w(ι) is a probability distribution on . A high-level problem is to find a refinement pair α=(α(0),α(1)) such that the distributions α(0) and α(1) on are as “close” to each other as possible.

Divergence Notion.

Given two distributions P and Q on the same sample space Ω, a notion of divergence 𝖣(PQ) measures how close two distributions are, where a smaller value typically means that the two distributions are closer. An example is the total variation distance 𝖣TV(PQ):=supEΩ|P(E)Q(E)|. However, note that in general, a divergence needs not be symmetric, i.e., it is possible that 𝖣(PQ)𝖣(QP). An example of a non-symmetric divergence notion is 𝖣KL(PQ)=ωΩpωlnpωqω.

Definition 26 (Closest Refinement Pair).

Given a distribution instance G=((0),(1);;w) and a divergence notion 𝖣, a refinement pair α=(α(0),α(1)) (as in Definition 8) is closest (with respect to 𝖣) if, the pair attains the minimum 𝖣(α(0)||α(1)) among all refinement pairs, whose value we denote as 𝖣(G).

 Remark 27.

Since a divergence is typically continuous and the collection of refinement pairs forms a compact set, the minimum is attainable in such a case.

However, note that since 𝖣 is not necessarily symmetric, from Definition 26 alone, it is not clear if a pair minimizing 𝖣(α(0)||α(1)) would also minimize 𝖣(α(1)||α(0)).

Definition 28 (Universally Closest Refinement Pair).

Given a distribution instance G=((0),(1);;w) and a collection 𝒟 of divergence notions, a refinement pair α=(α(0),α(1)) is universally closest (with respect to 𝒟) if, for all 𝖣𝒟, 𝖣(α(0)||α(1))=𝖣(G).

5.1 Class of Hockey-Stick Divergences

We first consider the class of hockey-stick divergences.

Definition 29 (Hockey-Stick Divergence).

Given distributions P and Q on the same sample space Ω and γ0, the (reversed666In the literature, the hockey-stick divergence 𝖣~γ is defined with the roles of P and Q interchanged: 𝖣~γ(PQ)=𝖣γ(QP). We change the convention to make the notation consistent when we consider power functions later.) hockey-stick divergence is defined as

𝖣γ(PQ):=supSΩQ(S)γP(S).

 Remark 30.

The hockey-stick divergence is related to the well-known (ϵ,δ)-differential privacy inequality. Specifically, 𝖣eϵ(PQ)δ iff for all subsets SΩ,

Q(S)eϵP(S)+δ.

To express the other direction of the inequality, i.e., 𝖣eϵ(QP)δ, we could consider another parameter γ^0 and keep the order of arguments in 𝖣γ^(PQ). Specifically, by considering the complement of events S in the above inequality, we can deduce that: 𝖣eϵ(QP)δ iff 𝖣eϵ(PQ)1eϵ(1δ).

We consider universal closest distribution refinements with respect to the class of hockey-stick divergences 𝒟HS:={𝖣γ:γ0}.

The following lemma shows that the hockey-stick divergence is closely related to the matching instance defined in Definition 21.

Lemma 31 (Hockey-Stick Divergence and Matching).

Suppose G=((0),(1);;w) is a distribution instance and α=(α(0),α(1)) is a refinement pair. For γ0, consider the pair of weights c:=(c(0)=γ,c(1)=1).

Then, the weight of the fractional matching cα in the matching instance G(c) is exactly 1𝖣γ(α(0)α(1)).

In order words, finding a closest refinement pair for the divergence 𝖣γ is equivalent to finding a maximum matching in G(c).

Proof.

Let S:={f:α(1)(f)γα(0)(f)}.

Observe that 𝖣γ(α(0)α(1))=fS(α(1)(f)γα(0)(f)).

Finally, the weight of the matching cα is:

fmin{γα(0)(f),α(1)(f)}=fSγα(0)(f)+fSα(1)(f)
= 1fS(α(1)(f)γα(0)(f))=1𝖣γ(α(0)α(1)),

as required.

Theorem 32 (Locally Maximin Pair is Closest for 𝒟HS).

Suppose in a distribution instance G, α=(α(0),α(1)) is a distribution refinement pair that is locally maximin and proportional response to each other. Then, α is universally closest with respect to the class 𝒟HS of hockey-stick divergences.

Proof.

The statement is a direct consequence of Lemmas 25 and 31.

5.2 Class of Data Processing Divergences

Extending Theorem 32, we show that a locally maximin pair is actually the closest with respect to a wide class of divergences. In fact, for any reasonable usage of divergence, it is intuitive that taking a partial view of a pair of objects should not increase the divergence between them. This is formally captured by the data processing inequality.

Definition 33 (Data Processing Inequality).

A divergence notion 𝖣 satisfies the data processing inequality if given any two distributions P and Q on the sample space Ω and any function φ:ΩΩ^, the induced distributions φ(P) and φ(Q) on Ω^ satisfy the monotone property:

𝖣(φ(P)φ(Q))𝖣(PQ).

We use 𝒟DPI to denote the class of divergences satisfying the data processing inequality.

It is known that all divergences in 𝒟DPI between two distributions P and Q can be recovered from the power function between them. Recall that in this work, we focus on finite sample spaces; see Figure 1 for an example. It is known that power functions have a fractional knapsack interpretation [25].

Definition 34 (Power Function).

Suppose P and Q are distributions on the same sample space Ω.

  • For a discrete sample space Ω, the power function 𝖯𝗈𝗐(PQ):[0,1][0,1] can be defined in terms of the fractional knapsack problem.

    Given a collection Ω of items, suppose ωΩ has weight P(ω) and value Q(ω). Then, given x[0,1], 𝖯𝗈𝗐(PQ)(x) is the maximum value attained with weight capacity constraint x, where items may be taken fractionally.

  • For a continuous sample space Ω, given x[0,1], 𝖯𝗈𝗐(PQ)(x) is the supremum of Q(S) over measurable subsets SΩ satisfying P(S)=x.

Refer to caption
(a) g=𝖯𝗈𝗐(P||Q).
Refer to caption
(b) g^=𝖯𝗈𝗐(QP)=(𝖯𝗈𝗐(PQ)).
Figure 1: Consider a sample space Ω with 6 elements indicated with different rainbow colors. Two distributions on Ω are given by the arrays: P=[0,0.1,0.14,0.11,0.41,0.24] and Q=[0.15,0.3,0.2,0.1,0.25,0], where the elements ωΩ are sorted in increasing order of the ratio Q(ω)P(ω). The left figure shows the power curve g=𝖯𝗈𝗐(P||Q), and the right figure shows g^=𝖯𝗈𝗐(QP) which is the reflection of g about the line y=1x.

In the literature (see [16] for instance), the tradeoff function 𝖳(PQ):=1𝖯𝗈𝗐(PQ), which has essentially equivalent mathematical properties, has also been considered. However, power functions have the same notation interpretation as divergences in the sense that “smaller” means “closer” distributions. In fact, Vadhan and Zhang [38] defined an abstract concept of generalized probability distance (with the tradeoff function as the main example) such that the notation agrees with this interpretation. We recap some well-known properties of power functions.

Fact 35 (Valid power functions [16, Proposition 2.2]).

Suppose g:[0,1][0,1] is a function. Then, there exist distributions X and Y such that 𝖯𝗈𝗐(XY)=g iff g is continuous, concave and g(x)x for all x[0,1].

Partial Order on Power Functions.

We consider a partial order on power functions. Given such functions g1 and g2, we denote g1g2 iff for all x[0,1], g1(x)g2(x).

We next state formally how a divergence satisfying Definition 33 can be recovered from the power function.

Fact 36 (Recovering Divergence From Power Function [16, Proposition B.1]).

For any divergence 𝖣𝒟DPI, there exists a functional 𝖣:[0,1][0,1] such that for any distributions P and Q on the same sample space, 𝖣(PQ)=𝖣(𝖯𝗈𝗐(PQ)).

Moreover, if g1g2, then 𝖣(g1)𝖣(g2).

The following is the main result of this section, which states that a locally maximin pair essentially solves the closest distribution refinement problem optimally in the most general sense.

Theorem 37 (Locally Maximin Pair Gives Minimal Power Function).

Suppose in a distribution instance G, α=(α(0),α(1)) is a distribution refinement pair that is locally maximin and proportional response to each other. Then, the power function g between any other distribution refinement pair of G satisfies 𝖯𝗈𝗐(α(0)α(1))g. Consequently, any such locally maximin pair gives rise to the same power function, and we denote 𝖯𝗈𝗐(G)=𝖯𝗈𝗐(α(0)α(1)).

From Fact 36, this implies that α is a universally closest refinement pair with respect to 𝒟DPI.

 Remark 38.

Observe that given any 𝖣𝒟DPI and any strictly increasing function ϕ:, we still have ϕ(𝖣())𝒟DPI. Since any deviation from the correct answer can be arbitrarily magnified by ϕ, it is difficult to give any meaningful approximation guarantee on the value of a general divergence in 𝒟DPI.

We prove Theorem 37 by analyzing the relationship between hockey-stick divergence and power function, and extend the result from Theorem 32. Similar to convex functions, we also consider the notion of subgradient for power functions.

Definition 39 (Subgradient).

Suppose g:[0,1][0,1] is a power function and x[0,1]. Then, the subgradient of g at x is defined as the collection g(x) of real numbers satisfying:

γg(x) iff for all y[0,1], g(x)+γ(yx)g(y).

In other words, the line segment with slope γ touching the curve g at x never goes below the curve.

The following is a special case of Fact 36, and describes explicitly how a hockey-stick divergence is recovered from a power function. See Figure 2 for an illustration.

Fact 40 (Recovering Hockey-Stick Divergence From Power Function).

Suppose P and Q are distributions on the same sample space Ω, where g=𝖯𝗈𝗐(PQ) is the power function between them. Then, for any γ0, there exists x[0,1] such that γg(x) and 𝖣γ(PQ)=g(x)γx.

In other words, one can find a line segment with slope γ touching the curve g and the y-intercept of the line will give 𝖣γ(PQ).

Proof.

The result should be standard, but we include a short proof for completeness. One can imagine a line segment with slope γ starting high above and gradually being lowered until touching the curve g at some x[0,1]. Observe that the line segment may touch the curve at more than 1 point, in which case we take x to be the infimum.

We partition the sample space Ω into S1:={ωΩ:Q(ω)>γP(ω)} and S2:=ΩS1. Observe that P(S1)=x and g(x)=Q(S1).

Then, it follows that 𝖣γ(PQ)=Q(S1)γP(S1)=g(x)γx, as required.

Refer to caption
Figure 2: Using the same example in Figure 1, consider a line segment with slope γ=1.2 touching the curve g at x. In this case, the red, orange, and yellow elements ω to the left of point x satisfy Q(ω)P(ω)>1.2.
Proof of Theorem 37.

Suppose g=𝖯𝗈𝗐(α(0)α(1)) is the power function between the given locally maximin pair α. For the sake of contradiction, suppose some other refinement pair α produces the power function g such that for some x[0,1], g(x)>g(x).

Fix some γg(x) and consider the line segment with slope γ touching curve g at x. By Fact 40, the hockey-stick divergence 𝖣γ(α) is the y-intercept of .

However, since g(x)>g(x), to find a line segment with slope γ touching curve g, we must move strictly upwards. This implies that 𝖣γ(α)>𝖣γ(α), contradicting Theorem 32 which states that α is universally closest with respect to 𝒟HS.

References

  • [1] Albert Angel, Nikos Sarkas, Nick Koudas, and Divesh Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proceedings of the VLDB Endowment, 5(6):574–585, 2012. doi:10.14778/2168651.2168658.
  • [2] Kenneth J Arrow and Gerard Debreu. Existence of an equilibrium for a competitive economy. Econometrica: Journal of the Econometric Society, pages 265–290, 1954.
  • [3] Oana Denisa Balalau, Francesco Bonchi, T.-H. Hubert Chan, Francesco Gullo, and Mauro Sozio. Finding subgraphs with maximum total density and limited overlap. In WSDM, pages 379–388. ACM, 2015. doi:10.1145/2684822.2685298.
  • [4] Nikhil Bansal and Ilan Reuven Cohen. Contention resolution, matrix scaling and fair allocation. In WAOA, volume 12982 of Lecture Notes in Computer Science, pages 252–274. Springer, 2021. doi:10.1007/978-3-030-92702-8_16.
  • [5] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009. doi:10.1137/080716542.
  • [6] Roger E Behrend. Fractional perfect b-matching polytopes i: General theory. Linear Algebra and its Applications, 439(12):3822–3858, 2013.
  • [7] Aaron Bernstein, Jacob Holm, and Eva Rotenberg. Online bipartite matching with amortized O(log 2 n) replacements. J. ACM, 66(5):37:1–37:23, 2019. doi:10.1145/3344999.
  • [8] Benjamin Birnbaum, Nikhil R Devanur, and Lin Xiao. Distributed algorithms via gradient descent for fisher markets. In Proceedings of the 12th ACM conference on Electronic commerce, pages 127–136, 2011. doi:10.1145/1993574.1993594.
  • [9] Richard A Brualdi. Combinatorial matrix classes, volume 13. Cambridge University Press, 2006.
  • [10] Niv Buchbinder, Anupam Gupta, Daniel Hathcock, Anna R. Karlin, and Sherry Sarkar. Maintaining matroid intersections online. In SODA, pages 4283–4304. SIAM, 2024. doi:10.1137/1.9781611977912.149.
  • [11] T.-H. Hubert Chan, Anand Louis, Zhihao Gavin Tang, and Chenzi Zhang. Spectral properties of hypergraph laplacian and approximation algorithms. J. ACM, 65(3):15:1–15:48, 2018. doi:10.1145/3178123.
  • [12] T.-H. Hubert Chan and Quan Xue. Symmetric splendor: Unraveling universally closest refinements and fisher market equilibrium through density-friendly decomposition. CoRR, abs/2406.17964, 2024. doi:10.48550/arXiv.2406.17964.
  • [13] Moses Charikar. Greedy approximation algorithms for finding dense components in a graph. In Klaus Jansen and Samir Khuller, editors, Approximation Algorithms for Combinatorial Optimization, pages 84–95, Berlin, Heidelberg, 2000. Springer Berlin Heidelberg. doi:10.1007/3-540-44436-X_10.
  • [14] Li Chen, Rasmus Kyng, Yang P. Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. Maximum flow and minimum-cost flow in almost-linear time. In FOCS, pages 612–623. IEEE, 2022. doi:10.1109/FOCS54457.2022.00064.
  • [15] Maximilien Danisch, T.-H. Hubert Chan, and Mauro Sozio. Large scale density-friendly graph decomposition via convex programming. In WWW, pages 233–242. ACM, 2017. doi:10.1145/3038912.3052619.
  • [16] Jinshuo Dong, Aaron Roth, and Weijie J Su. Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):3–37, 2022.
  • [17] Cynthia Dwork. Differential privacy. In ICALP (2), volume 4052 of Lecture Notes in Computer Science, pages 1–12. Springer, 2006. doi:10.1007/11787006_1.
  • [18] Irving Fisher. Mathematical investigations in the theory of value and prices, 1892.
  • [19] Satoru Fujishige. Lexicographically optimal base of a polymatroid with respect to a weight vector. Mathematics of Operations Research, 5(2):186–196, 1980. doi:10.1287/MOOR.5.2.186.
  • [20] Ashish Goel, Michael Kapralov, and Sanjeev Khanna. On the communication and streaming complexity of maximum bipartite matching. In SODA, pages 468–485. SIAM, 2012. doi:10.1137/1.9781611973099.41.
  • [21] Andrew V Goldberg. Finding a maximum density subgraph, 1984.
  • [22] Elfarouk Harb, Kent Quanrud, and Chandra Chekuri. Faster and scalable algorithms for densest subgraph and decomposition. Advances in Neural Information Processing Systems, 35:26966–26979, 2022.
  • [23] Elfarouk Harb, Kent Quanrud, and Chandra Chekuri. Convergence to lexicographically optimal base in a (contra)polymatroid and applications to densest subgraph and tree packing. In ESA, volume 274 of LIPIcs, pages 56:1–56:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.ESA.2023.56.
  • [24] Kamal Jain and Vijay V. Vazirani. Eisenberg-gale markets: Algorithms and game-theoretic properties. Games Econ. Behav., 70(1):84–106, 2010. doi:10.1016/J.GEB.2008.11.011.
  • [25] Joseph B Kadane. Discrete search and the neyman-pearson lemma. Journal of Mathematical Analysis and Applications, 22(1):156–171, 1968.
  • [26] Samir Khuller and Barna Saha. On finding dense subgraphs. In International colloquium on automata, languages, and programming, pages 597–608. Springer, 2009. doi:10.1007/978-3-642-02927-1_50.
  • [27] Bernhard H. Korte. Combinatorial optimization: theory and algorithms. Springer, 2000.
  • [28] Euiwoong Lee and Sahil Singla. Maximum matching in the online batch-arrival model. ACM Trans. Algorithms, 16(4):49:1–49:31, 2020. doi:10.1145/3399676.
  • [29] Xiangfeng Li, Shenghua Liu, Zifeng Li, Xiaotian Han, Chuan Shi, Bryan Hooi, He Huang, and Xueqi Cheng. Flowscope: Spotting money laundering based on graphs. In Proceedings of the AAAI conference on artificial intelligence, volume 34(04), pages 4731–4738, 2020. doi:10.1609/AAAI.V34I04.5906.
  • [30] Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Wenjie Zhang, and Xuemin Lin. Efficient algorithms for densest subgraph discovery on large directed graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 1051–1066, 2020. doi:10.1145/3318464.3389697.
  • [31] Ilya Mironov. Rényi differential privacy. In CSF, pages 263–275. IEEE Computer Society, 2017. doi:10.1109/CSF.2017.11.
  • [32] Rajeev Motwani, Rina Panigrahy, and Ying Xu. Fractional matching via balls-and-bins. In APPROX-RANDOM, volume 4110 of Lecture Notes in Computer Science, pages 487–498. Springer, 2006. doi:10.1007/11830924_44.
  • [33] Yurii Nesterov. A method for solving the convex programming problem with convergence rate o(1/k2). Proceedings of the USSR Academy of Sciences, 269:543–547, 1983. URL: https://api.semanticscholar.org/CorpusID:145918791.
  • [34] Alexander Schrijver et al. Combinatorial optimization: polyhedra and efficiency, volume 24(2). Springer, 2003.
  • [35] Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. Corescope: Graph mining using k-core analysis – Patterns, anomalies and algorithms. In 2016 IEEE 16th international conference on data mining (ICDM), pages 469–478. IEEE, 2016. doi:10.1109/ICDM.2016.0058.
  • [36] Nikolaj Tatti. Density-friendly graph decomposition. ACM Transactions on Knowledge Discovery from Data (TKDD), 13(5):1–29, 2019. doi:10.1145/3344210.
  • [37] Nikolaj Tatti and Aristides Gionis. Density-friendly graph decomposition. In Proceedings of the 24th International Conference on World Wide Web, pages 1089–1099, 2015. doi:10.1145/2736277.2741119.
  • [38] Salil P. Vadhan and Wanrong Zhang. Concurrent composition theorems for differential privacy. In STOC, pages 507–519. ACM, 2023. doi:10.1145/3564246.3585241.
  • [39] Fang Wu and Li Zhang. Proportional response dynamics leads to market equilibrium. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 354–363, 2007. doi:10.1145/1250790.1250844.
  • [40] Li Zhang. Proportional response dynamics in the fisher market. In ICALP (2), volume 5556 of Lecture Notes in Computer Science, pages 583–594. Springer, 2009. doi:10.1007/978-3-642-02930-1_48.
  • [41] Mingxun Zhou, Elaine Shi, T.-H. Hubert Chan, and Shir Maimon. A theory of composition for differential obliviousness. In EUROCRYPT (3), volume 14006 of Lecture Notes in Computer Science, pages 3–34. Springer, 2023. doi:10.1007/978-3-031-30620-4_1.
  • [42] Mingxun Zhou, Mengshi Zhao, T.-H. Hubert Chan, and Elaine Shi. Advanced composition theorems for differential obliviousness. In ITCS, volume 287 of LIPIcs, pages 103:1–103:24. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.ITCS.2024.103.