Abstract 1 Introduction 2 Preliminaries 3 Composed Correlated Sampling 4 Applications 5 Summary and Open Questions References

Dimension-Free Correlated Sampling for the Hypersimplex

Joseph (Seffi) Naor Technion – Israel Institute of Technology, Haifa, Israel Nitya Raju University of Maryland, College Park, MD, USA Abhishek Shetty Massachusetts Institute of Technology, Cambridge, MA, USA Aravind Srinivasan University of Maryland, College Park, MD, USA Renata Valieva University of Maryland, College Park, MD, USA David Wajc ORCID Technion – Israel Institute of Technology, Haifa, Israel
Abstract

Sampling from multiple distributions so as to maximize overlap has been studied by statisticians since the 1950s. Since the 2000s, such correlated sampling from the probability simplex has been a powerful building block in disparate areas of theoretical computer science. We study a generalization of this problem to sampling sets from given vectors in the hypersimplex, i.e., outputting sets of size (at most) k[n], while maximizing the overlap of the sampled sets. Specifically, the expected difference between two output sets should be at most α times their input vectors’ 1 distance. A value of α=O(logn) is known to be achievable, due to Chen et al. (ICALP’17). We improve this factor to O(logk), independent of the ambient dimension n. Our algorithm satisfies other desirable properties, including (up to a logn factor) input-sparsity sampling time, logarithmic parallel depth and dynamic update time, as well as preservation of submodular objectives. Anticipating broader use of correlated sampling algorithms for the hypersimplex, we present applications of our algorithm to online paging, offline approximation of metric multi-labeling, and swift multi-scenario submodular welfare approximating reallocation.

Keywords and phrases:
Correlated Rounding, Dependent Rounding
Funding:
Joseph (Seffi) Naor: Supported in part by ISF grant 3001/24 and United States – Israel BSF grant 2022418.
Abhishek Shetty: Supported in part by ARO award W911NF-21-1-0328, the Simons Foundation, NSF award DMS-2031883, a DARPA AIQ award, an NSF FODSI Postdoctoral Fellowship and an Apple AI/ML Fellowship.
Aravind Srinivasan: Supported in part by NSF award number CCF-1918749.
Renata Valieva: Supported in part by NSF award numbers CCF-1918749 and CNS-2317194.
David Wajc: Supported in part by the Taub Family Foundation “Leader in Science and Technology” fellowship, Grand Technion Energy Program (GTEP) and ISF grant 3200/24.
Copyright and License:
[Uncaptioned image] © Joseph Naor, Nitya Raju, Abhishek Shetty, Aravind Srinivasan, Renata Valieva, and David Wajc; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Probabilistic algorithms
Related Version:
Full Version: https://arxiv.org/abs/2511.13573
Editor:
Shubhangi Saraf

1 Introduction

Consider the following natural correlated sampling problem: we wish to sample from numerous discrete distributions over [n], or equivalently round numerous points in the n-dimensional probability simplex, Δn{𝐱[0,1]n:𝐱1=1}, in a consistent manner. In particular, for any two distributions with low total variation distance, i.e., points in Δn with low 1 distance, 𝐱𝐲1i|xiyi|, we wish the sets drawn according to these points to be similar in expectation.

This problem is well-understood [30, 26, 25, 9, 2, 5]. Moreover, efficient correlated sampling algorithms have found numerous applications to disparate areas of computation, from approximation algorithms [30, 25, 9, 37], parallel repetition [26, 4], replicability and privacy [3, 11], parallel sampling [32, 1], cryptography [36], locality sensitive hashing [7, 14], and more.

Such correlated sampling is a special case of what statisticians refer to as coordinated sampling, and have been studying since the 1950s [28]. Here, the motivation of maximizing overlap is in minimizing overhead of initiation interviews for newly sampled interviewees or overhead of hiring new interviewers for new regions with sampled interviewees. Such rationale was used and adopted, for example in the U.S. Bureau of the Census household surveys. See discussion of such applications and nearly half a century of work on such questions in the survey by Ernst [22].

In this work we study the following correlated sampling problem, introduced by [16], generalizing from the probability simplex to the convex hull of the hypersimplex and the origin, i.e., the polytope whose extreme points are vectors in {0,1}n with at most k ones,

Δn,k{𝐱[0,1]n:𝐱1k}.

We wish to round points in this set while preserving marginals, respecting the cardinality constraint of k, and guaranteeing that the expected distance of the output sets for any two points 𝐱,𝐲Δn,k is bounded in terms of the points’ 1 distance. Formally, we study the following problem.

Definition 1.

A distribution 𝒟 over deterministic algorithms 𝒜:Δn,k2[n] mapping points in Δn,k to subsets of [n] of cardinality at most k is an 𝛂-stretch correlated sampling algorithm for Δ𝐧,𝐤 if it satisfies the following properties for all 𝐱,𝐲Δn,k:

  1. (P1)

    (Cardinality) |𝒜(𝐱)|{𝐱1,𝐱1} (hence k) for all 𝒜 in the support of 𝒟.

  2. (P2)

    (Marginals) r𝒜𝒟[i𝒜(𝐱)]=xi for all i[n].

  3. (P3)

    (Stretch) 𝔼𝒜𝒟[|𝒜(𝐱)𝒜(𝐲)|]α𝐱𝐲1.

To clarify Property P3 and its use: after a common initialization step where we draw an algorithm 𝒜𝒟, we guarantee that when rounding any two vectors 𝐱,𝐲Δn,k, the output’s expected distance (over the random choice of 𝒜) is at most the vectors’ 1 distance times some α. Naturally, we wish to keep α as small as possible.

For k=1 it is known that α must be at least two [2, 5],111More precisely, some 𝐱,𝐲 must have stretch at least 2/(1+δ), for δ12𝐱𝐲1, though δ may tend to zero. and this bound is attainable, provided x1=k=1, i.e., for ΔnΔn,1 [30, 26, 25, 9, 2, 32]. (As we show in the full version of the paper, a constant bound of α=3 for Δn,1 more generally follows from the bound for Δn.) For k>1, in contrast, [16] show that an O(logn) stretch is achievable, and use it for applications to metric multi-labelling, which we define later.

The stark difference between the case k=1 and larger k>1 is apparent. The natural question, which we study, is whether this “logarithmic in n” stretch is inherent to the problem.

1.1 Our Contribution

Our main contribution is a correlated sampling algorithm for the hypersimplex with k>1 with stretch logarithmic in k, but independent of the input dimension, n.

Theorem 2.

There exists an O(logk)-stretch correlated sampling algorithm for Δn,k for all n.

Our algorithm is extremely fast, and can be implemented in O(nnzlogn) time, for nnzn the number of nonzeros and logn the (extremely slowly-growing) iterated logarithm function (see Definition 3). Thus, the algorithm’s running time is also very nearly linear in the input sparsity, and dimension independent. Our algorithm can further be implemented in O(lognlogn) parallel depth and O(log(nnz)logn) update time in dynamic settings.

Property P2 immediately implies preservation of linear objectives. We also prove that our algorithm preserves submodular objectives. (See Section 3.4 for some background, but for now it suffices to say that these functions capture the ubiquitous phenomenon of diminishing returns.) In particular, we prove that our correlated sampling algorithm’s output has at least as high a submodular value as independent rounding with marginals 𝐱 (the so-called multilinear extension), while being oblivious to the objective submodular function, and also satisfying the cardinality constraint and desired low stretch guarantees. In the full version of the paper we extend this property to correlated sampling for partition matroids, where the ground set is partitioned into parts, with a cardinality constraint on each part.

Due to the widespread uses of correlated sampling for the probability simplex, we anticipate numerous subsequent applications of our correlated sampling algorithm for the hypersimplex. To illustrate this, we provide several simple applications of our algorithm, to online paging, offline metric multi-labeling, and dynamic submodular welfare maximizing reallocation. As these applications only illustrate the variety of (potential) applications of our correlated sampling for the hypersimplex, we defer their description to Section 4. However, for our last application, we note that our dimension-free stretch hints at further applications: combining it with a simple rounding algorithm for the case that k is large, we can obtain constant stretch, while approximately preserving marginals and submodular objectives. We believe this pattern will benefit future applications of our algorithm.

2 Preliminaries

Notation.

Throughout, we use boldface letters 𝐱,𝐲, etc., and use 𝐱1i|xi| to denote the 1 norm. In our running times, we use the standard iterated (binary) logarithm notation. (Throughout the paper, all logarithms are to the base two.)

Definition 3.

For any n, the i-th iterated and the iterated logarithm are defined as follows.

log(i)n logloglogi timesn,
logn min{ilog(i)n1}.
Sublinear sample time.

Our algorithms’ sampling time is sublinear in the input’s dimension. Specifically, ignoring a modest logn factor, their running time is independent of n and only linear in the sparsity of the input vector 𝐱, i.e., nnz(𝐱)|{ixi0}|. (As 𝐱 will be clear from context, we simply write nnz.) For this, we rely on a sparse representation that allows us to avoid reading the entire vector. In particular, we assume a natural representation allowing us to iterate through the non-zero coordinates of 𝐱 in increasing order in time nnz(𝐱). Another operation we require is O(1)-time computation of standard bit-wise operations, XOR, AND, and NOT (see the full version).

Interface.

Correlated sampling algorithms must use a common random seed for sampling, otherwise we may get different output sets in two different invocations of the sampling step for the same vector 𝐱, violating Property P3 for vectors 𝐱=𝐲. Fittingly, correlated sampling algorithms use a common initialization step, where we draw 𝒜𝒟, setting this randomness, and sampling steps, where we compute 𝒜(𝐱). Note that in all our algorithms and the previous algorithms we make use of, the randomness of the initialization step can implicitly be “spread” over later sampling steps, drawing each random bit the first time it is needed. Therefore, all algorithms presented in this paper have constant initialization time, and so we avoid mentioning this, and only mention the sample time.

Previous Correlated Sampling Algorithms.

For Δn, [30, 26, 25, 9, 2, 32] give 2-stretch correlated sampling algorithms. For k>1, [16] provided an O(logn)-stretch correlated sampling algorithm for Δn,k, though a full proof of the latter is unavailable online.222We thank Roy Schwartz for sharing a preprint of the full version of [16] with us. Moreover, since we are interested in input-sparsity sample time and fast parallel and dynamic implementations of these, in the full version we provide full proofs of these algorithms, and implementations, with guarantees as in the following propositions. We also provide similar guarantees for O(1)-stretch correlated sampling in Δn,1 more generally, by a simple reduction.

Proposition 4.

There exists a 2-stretch O(nnz) sample time correlated sampling algorithm for Δn. The algorithm can be implemented in O(logn) parallel depth and O(n) work and using O(log(nnz))=O(logn) update time in dynamic settings.

Proposition 5.

There exists a correlated sampling algorithm for Δn,k for all n,k1 with sample time O(nnz) and stretch 2logn2logn+2. The algorithm can be implemented in O(logn) parallel depth and O(n) work and using O(logn) update time in dynamic settings.

Our input-sparsity implementation of [16] relies on constant-time lowest-common ancestor function, using which we only process relevant nodes in the postorder traversal of [16].

Bounding Stretch.

Throughout, in our analysis we fix two vectors 𝐱,𝐲Δn,k, and prove that we satisfy all three desired properties, P1-P3. The following well-known observation [16] will simplify our discussion. As usual, 𝐞i is the ith basis vector in the standard basis for n.

Observation 6.

By triangle inequality, a correlated rounding algorithm has stretch α if and only if this stretch of α holds for any two points 𝐱,𝐲=𝐱+ε𝐞iΔn,k, for infinitesimally small ε>0.

3 Composed Correlated Sampling

This section is dedicated to providing, analyzing, and implementing our main correlated sampling algorithm for the hypersimplex, whose main guarantee we restate here.

See 2 We present our algorithm in Section 3.1, showing that it satisfies Properties P1 and P2, and analyze its stretch in Section 3.2. We discuss running times (in various computational models) in Section 3.3. We then prove that the algorithm preserves submodular objectives in Section 3.4.

3.1 The Composed Correlated Sampling Algorithm

We compose several correlated sampling algorithms and additional hashing and random thresholds. We precede its formal description and pseudocode (Algorithms 1, 2, and 3) with a high-level layered description and motivation, interlaced with elements of its analysis.

3.1.1 Overview

Proposition 5 gives a correlated sampling algorithm for Δn,k with stretch O(logn). To obtain a smaller stretch, we aim, in a sense, to “project” the points onto a smaller dimension dn, so as to obtain a stretch O(logd)Θ(logn). Assuming the projection is in some sense invertible, this allows us to “lift” the same improved stretch back to the original (larger) dimension n.

A first attempt.

Consider a vector 𝐱. To try and achieve the desired invertible projection, we hash coordinates into dn buckets. In the pseudocode, d=m3, where these parameters will be clarified below. Assuming (for now) that the sum of the x-values hashed to the same bucket is not greater than one, we give this bucket’s coordinate a weight equal to the sum of the x-values hashed to it. Note that the obtained vector 𝐱^d has the same 1 norm as 𝐱. We now apply a correlated sampling algorithm to the vector 𝐱^ and obtain a subset S of [d] of size 𝐱1 (up to floor or ceiling). We then obtain a subset of [n] with the right marginals, for each bucket b whose coordinate in [d] is output in S, by picking a single coordinate i hashed to bucket b with probability xi/j:h(j)=bxj, using a simple correlated sampling algorithm for the case k=1 (Proposition 4).

A second attempt.

Unfortunately, the assumption that the sum of the x-values hashed to each bucket is at most one might be violated. However, by simple probabilistic arguments, one can show that taking sufficiently large dpoly(k), then small-valued coordinates (whose individual x-values are at most, say, 1/10) hashed to a bucket have sum exceeding one with probability at most 1/poly(d). We then hash large-valued coordinates, of which there are O(k)d, into their own buckets, which are unlikely to collide with buckets to which other coordinates were hashed. Assuming no collisions or heavy buckets, the above is our desired invertible projection. In case of 1/poly(d)-probability problematic events (heavy bucket or collisions), we fall back on some α-stretch algorithm. For now we think of α=O(logn), though generally we can (and do) plug in other values of α. By linearity of expectation, this fall back algorithm contributes α/poly(d) to our algorithm’s stretch, which is O(1), provided dα. (This paragraph motivates our choice of d=poly(m)=poly(k,α).)

The full algorithm.

Finally, our definition of small/large coordinates and heavy buckets can result in using two different correlated sampling algorithms (in different lines) to determine the output for different inputs 𝐱 and 𝐲 which only differ marginally in a single coordinate, i.e., 𝐲=𝐱+ε𝐞i. For example, this coordinate can be small for 𝐱, but large for 𝐲, or the change in this coordinate may result in one bucket being heavy when rounding 𝐲, but not 𝐱. This is extremely problematic, since this can result in our outputs for 𝐱 and 𝐲 being computed using different correlated sampling algorithms, and so these outputs are potentially uncorrelated (different) output sets, despite 𝐱 and 𝐲 being quite close. This results in stretch possibly as high as O(k/ε). To overcome this issue, we randomize our thresholds for items being large or small, and for buckets being heavy. As we show, this results in the extremely problematic event outlined above happening with probability at most ε/k. Therefore, the contribution to the stretch here is again only O(k/ε)ε/k=O(1), and the final stretch is only O(1)+O(logd)=O(logk+logα). Our final desired stretch of O(logk) follows by using this construction recursively, allowing us to plug in increasingly smaller values of α.

3.1.2 Algorithm Description

Our recursive algorithm relies on the correlated sampling algorithms of Propositions 4 and 5, and another correlated-sampling algorithm 𝒜 for Δn,k with stretch α. This algorithm 𝒜 can be the algorithm from Proposition 5, in which case α=2logn, or (as we use in general) another correlated-sampling algorithm – with a smaller α – obtained by recursing this construction. (We later use this algorithm recursively so that in each level the stretch from 𝒜 decreases and the overall stretch achieved is reduced.) At a high level, we rely on the algorithm for k=1 with hashing to compress and map the input point 𝐱Δn,k to a point in a lower-dimensional polytope 𝐱^Δm3,k for some m3n. If we successfully lowered the dimension of the point to round in an invertible manner, we then round this point using the [16] correlated-sampling algorithm from Proposition 5 for this lower dimension m3. When the compression does not succeed at some level of the recursion (due to hash collisions – this happens with small probability), we simply fall back to using algorithm 𝒜 on the original n-dimensional input vector.

Our basic algorithm’s pseudocode is given in Algorithms 1, 2, and 3. On initialization (Algorithm 1) we initialize the randomness used later (in Algorithm 2) by the other correlated-sampling algorithms. In addition, we randomly hash the coordinates into some large m25k5α5 many buckets. We then hash the coordinates and buckets into [m3], and draw two random thresholds σ,τUni[0,1/10]. In Algorithm 2, we call a coordinate i[n] small if 0<xi1/10+σ, else we call it large. If any two buckets or large coordinates are hashed to the same value in [m3] (a hash collision), or if the small coordinates in any one bucket have x-value summing to more than 1τ (an unusually heavy bucket), we simply run Algorithm 𝒜 on the input vector 𝐱. Otherwise, we use Algorithm 3 to “compress” the small coordinates in each bucket (whose sum is less than one), using the k=1 algorithm of Proposition 4: we obtain a single coordinate i called the representative, and have it “absorb” all the x-value of the small coordinates in its bucket – by taking on their x-values’ sum and nullifying all small coordinates’ x-values in its bucket (by resetting those to zero). Finally, using the (bijective) mapping from the remaining non-zero coordinates to [m3] (with representatives using their buckets’ hash), we run the [16] algorithm on a vector 𝐱^ of (smaller) dimension m3. We then invert the bijective mapping from the remaining non-zero coordinates in 𝐱 to [m3] to convert the output subset of [m3] to a subset of [n], and return this set.

Algorithm 1 Initialization. Used before all runs of Algorithm 2
Algorithm 2 Hypersimplex Correlated Sampling.
Algorithm 3 compress(𝐱,S).

3.1.3 First Observations

We note that Algorithm 2, which we denote henceforth by ALG, outputs a set of cardinality at most k, i.e., it satisfies Property P1.

Observation 7.

|ALG(𝐱)|{𝐱1,𝐱1} for all 𝐱Δn,k.

Proof.

Follows by Property P1 of 𝒜, whose output ALG may return in either Line 4 or 6, and by the same property of 𝒜k, since if ALG returns in Line 9, then since ix^i=ixi and |ALG(𝐱)|=|𝒜k(𝐱^)| in this case, we have |ALG(𝐱)|=|𝒜k(𝐱^)|{𝐱^1,𝐱^1}={𝐱1,𝐱1}.

Similarly, we note that the output of Algorithm 2 contains each element i[n] with probability xi, i.e., it satisfies Property P2. In fact, we prove a slightly stronger property. Specifically, for the randomness used by Algorithm 1 in Lines 511, we show the following.

Observation 8.

For any realization r of , element i[n] and vector 𝐱

r[iALG(𝐱)=r]=xi.

Consequently, by total probability, r[iALG(𝐱)]=xi.

Proof.

If =r implies that ALG outputs a set in Lines 4 or 6, then [ALG(𝐱)=r]=𝒜(𝐱), and so by Property P2 of 𝒜,

r[iALG(𝐱)=r]=r[i𝒜(𝐱)]=xi.

If =r implies that ALG outputs a set in Lines 9 and that iS, then x^Ci=xi and by Property P2 of 𝒜k,

r[iALG(𝐱)=r]=r[Ci𝒜k(𝐱^)=r]=x^ci=xi.

Finally, if =r implies that ALG outputs a set in Line 9 and that iS, then iALG(𝐱) iff both iREPb and Rb𝒜k(𝐱^) hold. By Property P2 of 𝒜k and 𝒜1, and since 𝐱^Rb=iSbxi independently of REPb (and generally 𝐱^ is independent of Rb), we have that

r[iALG(𝐱)] =r[Rb𝒜k(𝐱^)=r]r[iREPb=r]
=𝐱^RbxiiSbxi=xi.

The crux of the analysis is in bounding the algorithm’s stretch, i.e., quantifying Property P3.

3.2 Analyzing the Algorithm’s Stretch

Analysis overview

When rounding two vectors 𝐱,𝐲Δn,k, we have three possible scenarios, named in accordance with their intuitively increasingly poor conditional stretch.

Definition 9.

A pair of runs of ALG on vectors 𝐱,𝐲Δn,k (with the same randomness) is

  1. 1.

    Good (G) if ALG outputs a set for both 𝐱 and 𝐲 in Line 9,

  2. 2.

    Bad (B) if ALG outputs a set for both 𝐱 and 𝐲 in either Lines 4 or 6, and

  3. 3.

    Tragic (T) if ALG outputs a set for 𝐱 in Lines 4 or 6 and for 𝐲 in Line 9, or vice versa.

The following observation and subsequent lemma motivate the above terminology.

Observation 10.

For vectors 𝐱,𝐲Δn,k, the sets XALG(𝐱) and YALG(𝐲) satisfy

𝔼[|XY|T] 2k,
𝔼[|XY|B] α𝐱𝐲1.
Proof.

The bound for the tragic event is trivial, since |X|,|Y|k by Observation 7, and thus by triangle inequality, |XY||X|+|Y|2k always. Next, conditioned on the bad event, we have X=𝒜(𝐱) and Y=𝒜(𝐲), so the bound follows by Property P3 of the α-stretch correlated sampling algorithm 𝒜.

Note that the conditional stretch for the tragic event is unbounded, since it holds for arbitrarily small 𝐱𝐲1=ε, as opposed to the bounded stretch conditioned on a bad pair of runs. This justifies our choice of a more positive term for the latter event. We now justify the choice of best term for the good event (G): we show that it contributes O(logm)=O(logα+logk) to the stretch, which is o(α) unless 𝒜 is already an algorithm with our target stretch of O(logk). As we show later in Lemma 14, we have that r[G]=1o(1), and so this also implies that good runs have the smallest expected stretch of all three types of runs.

Lemma 11.

For vectors 𝐱,𝐲Δn,k, the sets XALG(𝐱) and YALG(𝐲) satisfy

E[|XY|G]r[G]=O(logk+logα)𝐱𝐲1.
Proof (Sketch).

By Proposition 5, the output sets have distance O(logm3)𝐱^𝐲^1=O(logk+logα)𝐱^𝐲^1 conditioned on the fact that runs for 𝐱 and 𝐲 are good. Therefore we wish to bound 𝔼[𝐱^𝐲^1G] in terms of 𝐱𝐲1=ε. In the case that xi and yi are both small, or are both large, 𝐱^ and 𝐲^ differ by a single coordinate, hence 𝐱^𝐲^1=𝐱𝐲1=ε. However if yi is large but xi is small, then 𝐱^𝐲^1=2xi+ε=O(1). As σ is randomly chosen, this last event happens with probability O(ε)=O(𝐱𝐲1), which then implies our desired bound by total expectation. See full version of the paper for details.

Given the preceding bounds on the expected contribution to the stretch of good runs, we wish to upper bound the probability of a pair of runs on vectors 𝐱 and 𝐲 being bad or tragic. To this end, we first upper bound the probability of Algorithm 2 running 𝒜 on the original input vector, by proving that the probability of a hash collision is low, conditioned on any realization of σ.

Lemma 12.

For any 𝐱Δn,k and possible realization s of σ,

r[h[m3] such that |{iSCi=h,xi>0}{bRb=h}|>1|σ=s]1m.
Proof.

Since ixik, the number of elements iL is at most k1/10+σ10k, while the number of buckets is m. The number of pairs of such is therefore at most (10k+m2). Since hashing is done uniformly, the probability of any pair colliding is 1m3. By union bound over all pairs and using m=25k5α510k/(21), and so 10k+m2m, the probability of a hash collision is at most

(10k+m2)1m3 (10k+m)22m31m.

Next, we show that under the same conditioning, the probability of any bucket b being heavier than 451τ is likewise small.

Lemma 13.

For any 𝐱Δn,k and possible realization s of σ,

r[b[m] such that iSbxi>45|σ=s]1m.
Proof (Sketch).

We divide small items into tiny items, having x value at most some λ=O(1/logm), and little items. For some b to have x value of small items exceeding 4/5 requires either the sum of x-values of tiny items in b to exceed its expectation by some constant, or at least some constant number C of coarse items to belong to b. Both events have probability polynomially small in m, the former by the choice of λ and Chernoff bounds, and the latter by union bound over the (few) C-tuples of the (somewhat few) little items. Union bounding over these events and over all b[m] then gives the lemma. See the full version for details.

Using the preceding lemmas, we can now upper bound the probabilities of the bad and tragic events.

Lemma 14.

The probability that a pair of runs of Algorithm 2 on 𝐱 and 𝐲 is bad or tragic satisfies

r[B] 2m,
r[T] 30εm.
Proof (Sketch).

The first bound follows by the union bound and the previous two lemmas. In contrast, if a pair of runs is tragic, then Algorithm 2 terminates in Line 9 for one input, but terminates early (Lines 4 or 6) for the other, due to either a hash collision or a heavy bucket. All (three) cases causing such an event require one of the 1/m-probability events from the previous lemmas to occur, and the random thresholds for an item being large or for a bucket being full to fall in a range of width at most ε=𝐱𝐲|1. But due to σ,τUni[0,1/10], any one of these three bad cases then occurs with probability 10ε/m. The upper bound on r[T] then follows by the union bound. See the full version for details.

With the above in place, we are now ready to bound the expected stretch of Algorithm 2.

Theorem 15.

For all 𝐱,𝐲Δn,k, the sets XALG(𝐱) and YALG(𝐲) satisfy

𝔼[|XY|]=O(logk+logα)𝐱𝐲1.
Proof.

By Observation 6, we can assume that 𝐱𝐲1=ε. By total expectation over the good, bad and tragic events, using Lemmas 11 and 14, we get:

𝔼[|XY|] =𝔼[|XY|G]r[G]+𝔼[|XY|B]r[B]+𝔼[|XY|T]r[T]
O(logk+logα)𝐱𝐲1+α2m𝐱𝐲1+2kε30εm𝐱𝐲1
O(logk+logα)𝐱𝐲1,

where the last inequality follows from our choice of m satisfying m2α and m60k.

Taking Algorithm 𝒜 in Algorithm 2 to be the α=O(logn)-stretch algorithm of Proposition 5, Theorem 15 immediately implies an O(logk+loglogn)-stretch correlated sampling algorithm. We improve on this bound and obtain an O(logk) stretch by recursing our construction. We start by describing the construction.

Our recursive construction.

For the base case, 𝒜0 is the algorithm of Proposition 5. For all i1 we let 𝒜i be Algorithm 2 with Algorithm 𝒜 in Lines 4 and 6 chosen to be 𝒜i1.333We can also consider running 𝒜i1 in Line 9, but we focus on the simpler recurrence. We denote by αi the stretch of algorithm 𝒜i. By Proposition 5, α02logn+2. On the other hand, by Theorem 15, for general i we have for some constant C>1 the recurrence

αi+1C(logk+logαi).

Using this recursive construction we obtain our claimed O(logk) stretch, as follows.

Theorem 2. [Restated, see original statement.]

There exists an O(logk)-stretch correlated sampling algorithm for Δn,k for all n.

Proof.

Let p3Clogk+10C2, and note that 10C2/log(10C2)3C for all C>1. Consequently, since x/logx is increasing for x10C2>e, we have that if αip10C2, then αi/3Clogαi. Moreover, if αip, then trivially αi/3p/3Clogk. Combining the above we have that if αip then 2αi/3C(logk+logαi)αi+1. Thus, since α0=O(logn), for ilog3/2(α0)=O(loglogn), Algorithm 𝒜i has stretch αip=O(logk).

By the preceding proof, to get an O(logk) stretch we only need a recursion depth of O(loglogn). In the following section we refine this bound and consider other computational aspects of implementations of Algorithm 2 and our derived recursive correlated sampling algorithm.

3.3 Computational Considerations

In our preceding proof of Theorem 2, we only used the fact that the stretch decreases by a constant factor until it is below some O(logk) term. However, since the dependence on n in the stretch of each algorithm 𝒜i is asymptotically logarithmic in the stretch of the preceding algorithm 𝒜i, we can show that the first iterations decrease the stretch significantly faster and so significantly fewer levels of recursion are necessary to attain stretch O(logk), which translates into speedups for our algorithm.

Lemma 16.

The i-th algorithm 𝒜i for all ilogn2 has stretch αi=O(ilogk+log(i)n).

The proof, which is a simple inductive argument, is deferred to the full version. We also show there that after logn levels of recurrence, giving a stretch of O(logklogn), by the halving argument from our previous proof of Theorem 2, we then decrease the stretch by a further logn factor in only O(loglogn) more levels of recursion, yielding the following.

Theorem 17.

For some i=logn+O(loglogn)=O(logn), Algorithm 𝒜i has stretch O(logk).

 Remark 18.

One can show that the more involved recurrence mentioned in Footnote 3 converges exponentially faster: it allows to double the number of iterated logarithms in each level, thus improving on Lemma 16, yielding a stretch αi=O(ilogk+log(𝟐𝐢)n) for i=O(loglogn), and thus this number of recursive calls (asymptotically) suffices for a stretch of O(logk). For simplicity of exposition, we omit the details.

In the remainder of this section we discuss the computational ramifications of Theorem 17 in various computational models, the most immediate one being the classic centralized setting. In all the following theorems, we note that

Theorem 19 (Near-Input-Sparsity Sample Time).

There exists an O(logk)-stretch correlated sampling algorithm for Δn,k with sample time O(nnzlogn) in the worst case.

Proof.

By Theorem 17, we can focus on implementing 𝒜i for some i=O(logn). Compression in each of the O(logn) levels of the recurrence requires O(nnz) time, since we spend time proportional to each bucket’s vector, by Proposition 4. The final call to the rounding algorithm of Proposition 5 (for whichever dimension dn we end up using) takes a further O(nnz) time. Finally, returning from the recurrence and lifting the output subset to a larger universe requires O(nnz) time in each level, for a further O(nnzlogn) time.

Our recurrence depth together with implementations of the basic correlated sampling algorithms (Propositions 4 and 5) also implies that our algorithm can be (i) parallelized (at the cost of increasing the sample time to be slightly superlinear in n), with depth near logarithmic. Similarly, and that it can (ii) be dynamized with the near-logarithmic same update time. (See the full version).

Theorem 20 (Parallel Implementation).

There exists an O(logk)-stretch correlated sampling algorithm for Δn,k with sampling time O(nlogn) and depth O(lognlogn).

Theorem 21 (Dynamic Implementation).

There exists an O(logk)-stretch correlated sampling algorithm for Δn,k with update time O(lognlogn).

3.4 Submodular Dominance

Since correlated sampling algorithms preserve marginals (Property P2), they naturally preserve linear objectives. In this section we show that our rounding scheme also preserves subomdular objectives that capture the ubiquitous notion of diminishing returns. We recall some basic definitions.

3.4.1 Submodularity: Technical Background

Definition 22.

A set function f:2E is submodular if it satisfies the diminishing returns property, namely

f(eS)f(eT)STE{e},

where f(eS)f(S{e})f(S) denotes the marginal value (measured by f) of adding e to S.

As we are interested in rounding-based algorithms, we need a way to extend (binary-valued) submodular functions to real vectors. We One natural such extension, corresponding to independent rounding, is the multilinear extension of [13].

Definition 23.

The multilinear extension F:[0,1]E of a set function f:2E is given by

F(𝐱)SEf(S)iSxiiS(1xi).

The maximum multilinear objective subject to any solvable polytope (i.e., a polytope over which one can optimize linear objectives efficiently) can be (11/e)-approximated in polynomial time [13]. This is optimal even subject to cardinality constraints, under standard complexity-theoretic assumptions [23] and information-theoretically given only value oracle access to f [34, 33].

Given the above approximability of the multilinear extension, it is natural to wish to round vectors 𝐱 while preserving the given constraints, as well as preserving the multilinear objective. This is known to be achievable for matroid constraints and other constraints if f is known [13, 15]. An objective-oblivious counterpart to the above is given by the following definition.

Definition 24.

A random vector 𝐗{0,1}E satisfies submodular dominance if for any submodular function f:2E, we have that

𝔼[f(𝐗)]F(𝔼[𝐗]).

It is known that tree-based pivotal sampling [38] (the core algorithm used by [16]) satisfies submodular dominance. This follows since tree-based pivotal sampling satisfies negative association (and more) [6, 19, 21, 12], and the latter is known to imply submodular dominance [35, 17]. We provide some relevant background on negative association here.

Definition 25.

A random vector 𝐗 is negatively associated (NA) if for any two increasing functions f,g of disjoint variables in 𝐗, we have Cov(f,g)0.

A simple example of the above is given by the so-called 0-1 lemma [20].

Proposition 26.

Any random vector 𝐗{0,1}n with iXi1 is NA.

It is well-known that NA is closed under products [27].

Proposition 27.

If independent vectors 𝐗 and 𝐘 are NA, then so is their concatenation, 𝐗𝐘.

A more involved example of NA is the output of [16]. This follows since the latter is the output of pivotal sampling [38, 18] (albeit with correlated random seed) with a tree order (see the full version), whose output is known to satisfy NA [6, 12, 31].

Proposition 28.

The output of [16] is NA.

Recall that our interest in NA is due to its implication of submodular dominance [17, 35].

Proposition 29.

Let f be a submodular function and 𝐗 an NA vector. Then, 𝔼[f(𝐗)]F(𝔼[𝐗]).

3.4.2 Submodular Dominance of Algorithm 2

In this section we show that the output of Algorithm 2 satisfies submodular dominance, using the connection of the latter to negative association.

Theorem 30.

The output of Algorithm 2 with 𝒜 and 𝒜k satisfying submodular dominance itself satisfies submodular dominance.

Proof.

Fix a realization r of . This in particular fixes the definition of large and small items, the buckets and the mappings. If r is such that we return a set in Lines 4 or 6, then our output is 𝒜k(𝐱), which satisfies submodular dominance by Proposition 28, and so by Property P2 of 𝒜k,

𝔼[f(ALG(𝐱))=r]=𝔼[f(𝒜k(𝐱))=r]F(𝔼[𝒜k(𝐱)=r])=F(𝐱).

Otherwise, given T=𝒜(𝐱^), our Algorithm 2 outputs in Line 9 a set

S(T){iCiT}b:RbTRepb.

To argue submodular dominance, we define the following auxiliary submodular function:

f^(T)(i1,,ir)(b1,,br):Tb{Rb}={Rb1,,Rbr}f((S(T)L)j=1r{ij})j=1rxijx^Rb.

Since the output of 𝒜 is independent of Repb, for any realization of T we can imagine that (independently) for each RbT, some Repb is drawn from b according to the discrete distribution (xi/x^Rb)ib, by Property P2 of 𝒜0. Thus f^(T)=𝔼[f(S(T))] according to this sampling of {Repb}b. So, if we let I{(i1,,ir)Sri1,,ir belong to distinct b}, then taking total probability over T, expanding f^(), and using the subomdular dominance of 𝒜, we have:
𝔼[f(S(T))] =𝔼[f^(T)] T[m3]f^(T)iTxii[m3]T(1xi) =SL(i1,,ir)If(Lsj=1r{ij})iSxiiLS(1xi)j=1rxijb:Rbj{ij}=(1x^Rb).

Fortunately, this unwieldy expression has a simple interpretation: this is precisely the subomdular objective obtained by sampling each large item i independently with probability xi, and sampling at most one element i per bucket b with probability xi, independently of the large items and other buckets. But by Propositions 26 and 27, this distribution is NA, and since NA implies submodular dominance [17, 35], we obtain submodular dominance conditioned on =r. By Observation 8, this implies that 𝔼[f(ALG(𝐱))=r]F(ALG(𝐱)). The subomdular dominance for the unconditional distribution then follows since by total expectation over .

4 Applications

Recall that correlated sampling algorithms for Δn have found numerous varied applications. In this section we discuss three applications of our rounding scheme: (i) an application to rounding fractional online paging algorithms, which yields worse algorithms than known, yet is simple and hints at more applications in online settings, (ii) an application to metric multi-labeling via a framework of [16], improving their approximation from O(logn) to O(logk), and (iii) a more complicated application to collusion-free and swift submodular welfare maximizing reallocation in dynamic settings. We conclude the latter with a discussion of combining dimension-free bounds with a simple rounding scheme to effectively obtain constant stretch, at the cost of only nearly achieving the marginals (Property P2), and hence incurring a (1ε) loss in submodular objectives.

We emphasize that our main message here is the variety of the applications, which we believe hints at potential future wide-ranging applications of our more general correlated sampling machinery.

4.1 Online Paging

Recall that in the (online) paging problem, a cache of size k is given, able to store any of n pages. At each time t a page it[n] is requested. If it is not in the cache and the cache is full, this causes a cache miss and some other page must be evicted from the cache to make room for it. The objective is to minimize the number of evictions (hence cache misses), compared to the hindsight-optimal solution, measured in terms of the (multiplicative) competitive ratio.

A natural LP (linear programming) relaxation of the problem, with decision variables yi,t[0,1] corresponding to the extent to which page i is in the cache at time t, and zi,t=|yi,tyi,t1| the extent to which page i is evicted at time t, is as follows.

min itzi,t
s.t. yit,t1 t
yi,tyi,t1zi,t i,t
yi,tyi,t1zi,t i,t
yi,t0 i,t.

No fractional online algorithm is better than O(logk)-competitive with respect to this LP, and this ratio can be attained using the online primal-dual method [8]. We show that our correlated rounding algorithms provide O(log2k)-competitive randomized integral algorithms. While better randomized algorithms with competitive ratio O(logk) have been known since the ’90s [24], this hints at broader applicability of our correlated sampling algorithm for online problems.

Rounding fractional caches using correlated sampling.

To round fractional caching algorithms we use our O(logk)-stretch correlated sampling Algorithm 2 for 𝒫n,k, abbreviated by 𝒜. For 𝐱(t), the fractional cache at time t, our integral cache at time t is simply 𝒜(𝐱(t)). The random cache at each time t is feasible, since (i) |𝒜(𝐱(t))|k by Property P1, and (ii) r[it𝒜(𝐱(t))]=yi,t by Property P2, thus implying that this cache contains page it with probability one. On the other hand, the expected number of page misses at time t is, by Property P3, at most |𝒜(𝐱(t))𝒜(𝐱(t1))|=O(logk)𝐱(t)𝐱(t1)1=O(logk)i,tzi,t. Therefore, by linearity of expectation, the obtained (feasible) online paging algorithm incurs at most O(logk)i,tzi,t evictions in expectation. Therefore, since the fractional online algorithm’s objective i,tzi,tO(logk)OPT, the randomized algorithm is at most O(log2k) competitive.

4.2 Metric Multi-Labeling

Chen et al. [16] introduced the correlated sampling problem which we study. Their motivation comes from multi-label classification problems, in which labels need to be assigned to objects, e.g., news articles, given some observed data. They considered the case where multiple labels can be assigned to objects, called the metric multi-labeling problem. It arises in various settings, e.g., classification of textual data such as web pages, semantic tagging of images and videos, and functional genomics. The assignment of labels to objects should be done in a manner that is most consistent with the observed data, from which two important ingredients are derived. The first is an assignment cost for every (object,label) pair, reflecting a recommendation given by a local learning process which infers label preferences of objects. The second is similarity information on pairs of objects, giving rise to separation costs incurred once different label sets are assigned to a pair of similar objects. The goal is to find a labeling that minimizes a global cost function, while taking into account both local and pairwise information.

Chen et al. [16] considered the setting in which the number of labels that an object can receive is at most kn. They formulated the metric multi-labeling problem in this setting as a linear program that maps the m objects into a set of (fractional) vectors 𝐲1,,𝐲mΔn,k, where the ith entry of vector j indicates the fraction of label i that object j receives. The objective function minimizes the global cost function of the (fractional) labeling, i.e., the sum of the assignment costs and the separation costs. The following lemma is proved in [16].

Proposition 31.

If there is a polynomial-time α-stretch correlated sampling algorithm for Δn,k, then the metric multi-labeling problem admits a polynomial-time α-approximation algorithm.

The proof of the lemma is based on the following observations: (1) the preservation of the marginals in the correlated sampling algorithm guarantees that assignment costs are preserved in expectation; and (2) the α-stretch implies that separation costs are preserved in expectation with a loss of a factor α. As discussed in detail above and in the full version of the paper, [16] provided a polynomial-time O(logn)-stretch algorithm, and from this they obtain an O(logn)-approximate algorithm for metric multi-labeling. Our main result that improves the stretch to O(logk) then implies the same improved approximation ratio for this problem. The latter is an asymptotic improvement of interest, as k is typically much smaller than n for this problem.

4.3 Swift and Collusion-Free Reallocation

We now turn to our most involved application. Part of our arguments are inspired by and build upon recent unpublished work of [10]. They studied the problem of low-recourse cardinality-constrained submodular maximization in a dynamic setting with unknown future.

In contrast, we show how our correlated sampling can be used to provide similar guarantees that are history-independent, and therefore collusion-resistant, given a sequence of potential scenarios that might require swift reallocation. This application relies on all properties of our algorithm, including submodular dominance (Theorem 30), and dimension-free stretch O(logk). We show that the latter property combines nicely with a simple constant-stretch algorithm which approximately preserves marginals and submodular objectives up to an additional (1ε) factor for the regime that kpoly(1/ε) (by simple concentration arguments): this combination results in O(log(1/ε)-stretch at the cost of a (1ε) loss in the objective, by running the appropriate (near-)correlated sampling algorithm depending on whether kpoly(1/ε) or not. We turn to describing the problem we study in this section.

Submodular welfare maximization.

In the basic submodular welfare (SWF) maximization problem, we have n buyers and m sellers. Each seller s sells some number ks of homogeneous items from some universe U. Each buyer b in turn has a monotone submodular valuation function fb:U+ (i.e., fb(S)fb(T) for every STU). The objective is to allocate these items, at most one of each, to the buyers, so as to maximize the social welfare, bfb(Sb), for SbU the set of items allocated to b. We can encode the constraint that at most ks items sold by s are allocated, at most one of which to each buyer, by requiring the (possibly fractional) allocation vector 𝐱 with (s,b) to be a concatenation of m points in hypersimplexes, Δn,ks with appropriate ks.

bxb,sks s
0xb,s1 b,s.

SWF is NP-hard to approximate within a factor of (11/e) [29]. There are many approaches to attain this approximation factor in polynomial time. Most relevant to our subsequent dynamic SWF reallocation problem is the relax-and-round approach, as follows: the continuous greedy algorithm can be used to provide a (11/e)-approximation to the maximum multilinear extension subject to any solvable LP constraints, such as the above linear constraints [13]. One can then round the vectors using Algorithm 2, whose output satisfies the hypersimplex constraints (Property P1, matches the marginals (Property P2) and satisfies submodular dominance (see Theorem 30) (see Proposition 28), applying this algorithm to each sub-vector 𝐱s with coordinates {(s,b)b[n]}, corresponding to the cardinality constraint of ks for items sold by s. By the algorithm’s properties it satisfies all constraints, and as shown in the full version of the paper, this also satisfies submodular dominance for the entire vector (with the target marginals), and so the obtained value is at least as high as the multilinear extension, which gives us a (11/e)-approximation [13].

The dynamic problem.

Suppose now that we have r possible scenarios, corresponding to some buyers or sellers leaving or entering the market. Let SWF(i) denote the optimal SWF of scenario i. We wish to provide a good approximation of SWF(i), while guaranteeing swift reallocation when switching from one scenario to the other. In particular, we wish to provide a β approximation (to be chosen shortly) for each configuration, while minimizing the maximum number of items reallocated. This can be captured by the following LP constraints, where xb,s,i is the fractional allocation to buyer b of items of seller s under scenario i:

min R (MMD-LP)
s.t. b,szb,s,i,jR i,j
xb,s,ixb,s,jzb,s,i,j b,s,i,j
xb,s,jxb,s,izb,s,i,j b,s,i,j
bxb,sks s scenario i
bxb,s0 s scenario i
𝐱𝟎.

The above formulation is still missing the SWF-approximation. Building on the approximate-or-separate method of [10] (as outlined in the full version), we can provide in polynomial-time a solution to the above constraint subject to the multilinear extension of the objective submodular function over 𝐱i(xb,s,i)b,s (the fractional assignment for scenario i) being a (11/eε)2-approximation of SWF(i) for each i. The question remains: how do we round all these vectors in a way to obtain bona fide integral solutions with low movement cost? [10] provide an approach that results in O(R) movement cost (i.e., a constant-approximation of the optimal movement cost). Unfortunately, their algorithm is history dependent, which may incentivize buyers and sellers to time their arrivals and departures. Our correlated sampling algorithm, in contrast, is history-independent. Using our correlated sampling algorithm, which satisfies the hard cardinality constraints (Property P1), the target marginals (Property P2), O(logk) stretch (Theorem 2) and submodular dominance (Theorem 30), we obtain an algorithm with (11/eε)2-approximation, using movement cost O(Rlogk), i.e., O(logk) times that of an optimal algorithm.

4.4 Discussion: From Dimension-Free to Constant

The above stretch for collusion-free reallocation can in fact be decreased to a constant for any constant ε>0, and specifically to O(log(1/ε)), albeit at the cost of a (negligible) (1ε) deterioration in approximation, and losing the history independence. Since this is subsumed by the history-dependent O(1)-stretch rounding of [10], we only outline the idea here briefly.

By sampling a uniform threshold for each coordinate, we can decide which elements to take to our solution (i.e., which item to sell to which buyer), using stretch O(1): when xiτi for i.i.d., τiUni[0,1], we add i to the output. This yields the correct marginals independently, and so yields a set of expected value exactly equal to the multilinear extension, F(𝐱), and moreover results in a set of expected size 𝐱k. To make this a feasible set, we can scale down the value by a factor of 1+Θ(k), thus obtaining a set of expected size kΘ(klnk). By simple concentration arguments, the probability that this set exceeds size k is polynomially small in k, and by submodularity, one can show that taking only a subset of the sampled elements if more than k are sampled will only incur an expected additive loss of (1/poly(k))f(OPT). Combined with the (1Θ(klnk)) multiplicative loss from scaling down, this is a 1ε multiplicative factor and εf(OPT) additive factor, provided k=Ω(log(1/ε)1/ε2) is sufficiently large. On the other hand, by only keeping a subset of cardinality min{k,|S|} of the sampled elements S, it is easy to obtain stretch O(1). Thus, we get a (1ε) loss in objective (provided the target value is Ω(f(OPT))) with O(1) stretch if k is larger than some poly(1/ε). Alternatively, if k is smaller than poly(1/ε), then by applying this paper’s main correlated sampling algorithm, we get no loss in the objective value, but stretch O(logk)=O(log(1/ε)), i.e., constant for any constant ε>0. In either case, we incur a stretch no worse than a 1ε deterioration in approximation quality, and a constant stretch of O(log(1/ε)). We believe this pattern will find future applications.

5 Summary and Open Questions

In this work we revisit the correlated sampling question for the hypersimplex, Δn,k, and provide dimension-free stretch guarantees for the latter. We provide stretch O(logk) using a recursive algorithm of depth O(logn).

We note that any dimension-dependent stretch f(n)logn should imply by similar approaches an O(f(k)) stretch with O(f(n)) levels of recurrence, where f(n)min{if(i)(n)1} is “the iterated f”, defined using the ith iteration of f, namely f(i)(x)f(f(i1)(x))𝟙[i>0]+x𝟙[i=0]. However, better than logarithmic dependence on n is as yet unknown. Is this inherent, and is Θ(logk) the optimal stretch given Properties P1 and P2? We leave this as a tantalizing open question.

We presented a number of applications, to online paging, metric multi-labeling and swift and history-independent dynamic reallocation for approximate submodular welfare maximization. While we do not see any of these applications as particularly compelling on their own, the variety of applications hints at further applications of our correlated sampling machinery. In particular, we discussed in Section 4.4, the dimension-independent stretch can, for some applications, be combined with simple rounding schemes to obtain constant stretch (while only approximately preserving marginals). We are optimistic that similarly to correlated sampling for the probability simplex, such (dimension-free) correlated sampling for the hypersimplex will find broader applications. We leave the search for such applications as a direction for future research.

References

  • [1] Nima Anari, Ruiquan Gao, and Aviad Rubinstein. Parallel sampling via counting. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 537–548, 2024. doi:10.1145/3618260.3649744.
  • [2] Omer Angel and Yinon Spinka. Pairwise optimal coupling of multiple random variables. arXiv preprint arXiv:1903.00632, 2019.
  • [3] Ghazi Badih, Kumar Ravi, and Manurangsi Pasin. User-level private learning via correlated sampling. Advances in Neural Information Processing Systems, 2021.
  • [4] Boaz Barak, Moritz Hardt, Ishay Haviv, Anup Rao, Oded Regev, and David Steurer. Rounding parallel repetitions of unique games. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 374–383, 2008. doi:10.1109/FOCS.2008.55.
  • [5] Mohammad Bavarian, Badih Ghazi, Elad Haramaty, Pritish Kamath, Ronald L Rivest, and Madhu Sudan. Optimality of correlated sampling strategies. Theory of Computing, 16(1), 2020. doi:10.4086/TOC.2020.V016A012.
  • [6] Petter Brändén and Johan Jonasson. Negative dependence in sampling. Scandinavian Journal of Statistics, 39(4):830–838, 2012.
  • [7] Andrei Z Broder. On the resemblance and containment of documents. In Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171), pages 21–29, 1997. doi:10.1109/SEQUEN.1997.666900.
  • [8] Niv Buchbinder, Kamal Jain, and Joseph (Seffi) Naor. Online primal-dual algorithms for maximizing ad-auctions revenue. In Proceedings of the 15th Annual European Symposium on Algorithms (ESA), pages 253–264. 2007.
  • [9] Niv Buchbinder, Joseph (Seffi) Naor, and Roy Schwartz. Simplex partitioning via exponential clocks and the multiway-cut problem. SIAM Journal on Computing (SICOMP), 47(4):1463–1482, 2018. doi:10.1137/15M1045521.
  • [10] Niv Buchbinder, Joseph (Seffi) Naor, David Wajc, et al. Chasing submodular objectives, and submodular maximization via cutting planes. arXiv preprint arXiv:2511.13605, 2025.
  • [11] Mark Bun, Marco Gaboardi, Max Hopkins, Russell Impagliazzo, Rex Lei, Toniann Pitassi, Satchit Sivakumar, and Jessica Sorrell. Stability is stable: Connections between replicability, privacy, and adaptive generalization. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 520–527, 2023. doi:10.1145/3564246.3585246.
  • [12] Jarosław Byrka, Piotr Skowron, and Krzysztof Sornat. Proportional approval voting, harmonic k-median, and negative association. In Proceedings of the 45th International Colloquium on Automata, Languages and Programming (ICALP), page 26, 2018.
  • [13] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing (SICOMP), 40(6):1740–1766, 2011. doi:10.1137/080733991.
  • [14] Moses S Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 380–388, 2002. doi:10.1145/509907.509965.
  • [15] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Dependent randomized rounding via exchange properties of combinatorial structures. In Proceedings of the 51st Symposium on Foundations of Computer Science (FOCS), pages 575–584, 2010.
  • [16] Shahar Chen, Dotan Di Castro, Zohar Karnin, Liane Lewin-Eytan, Joseph Seffi Naor, and Roy Schwartz. Correlated rounding of multiple uniform matroids and multi-label classification. In Proceedings of the 44th International Colloquium on Automata, Languages and Programming (ICALP), 2017.
  • [17] Tasos C Christofides and Eutichia Vaggelatou. A connection between supermodular ordering and positive/negative association. Journal of Multivariate analysis, 88(1):138–151, 2004.
  • [18] Jean-Claude Deville and Yves Tille. Unequal probability sampling without replacement through a splitting method. Biometrika, 85(1):89–101, 1998.
  • [19] Devdatt Dubhashi, Johan Jonasson, and Desh Ranjan. Positive influence and negative dependence. Combinatorics, Probability and Computing, 16(01):29–41, 2007. doi:10.1017/S0963548306007772.
  • [20] Devdatt Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. Random Struct. Algorithms, 13(2):99–124, 1998. doi:10.1002/(SICI)1098-2418(199809)13:2\%3C99::AID-RSA1\%3E3.0.CO;2-M.
  • [21] Devdatt P Dubhashi, Volker Priebe, and Desh Ranjan. Negative dependence through the FKG inequality. BRICS Report Series, 3(27), 1996.
  • [22] Lawrence R Ernst. The maximization and minimization of sample overlap problems: a half century of results. In Bulletin of the International Statistical Institute, Proceedings, volume 58, pages 293–296, 1999.
  • [23] Uriel Feige. A threshold of lnn for approximating set cover. Journal of the ACM (JACM), 45(4):634–652, 1998. doi:10.1145/285055.285059.
  • [24] Amos Fiat, Richard M Karp, Michael Luby, Lyle A McGeoch, Daniel D Sleator, and Neal E Young. Competitive paging algorithms. Journal of Algorithms, 12(4):685–699, 1991. doi:10.1016/0196-6774(91)90041-V.
  • [25] Dongdong Ge, Simai He, Yinyu Ye, and Jiawei Zhang. Geometric rounding: a dependent randomized rounding scheme. Journal of combinatorial optimization, 22:699–725, 2011. doi:10.1007/S10878-010-9320-Z.
  • [26] Thomas Holenstein. Parallel repetition: simplifications and the no-signaling case. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC), pages 411–419, 2007. doi:10.1145/1250790.1250852.
  • [27] Kumar Joag-Dev and Frank Proschan. Negative association of random variables with applications. The Annals of Statistics, pages 286–295, 1983.
  • [28] Nathan Keyfitz. Sampling with probabilities proportional to size: adjustment for changes in the probabilities. Journal of the American Statistical Association, 46(253):105–109, 1951.
  • [29] Subhash Khot, Richard J Lipton, Evangelos Markakis, and Aranyak Mehta. Inapproximability results for combinatorial auctions with submodular utility functions. In Proceedings of the 1st Conference on Web and Internet Economics (WINE), pages 92–101, 2005. doi:10.1007/11600930_10.
  • [30] Jon Kleinberg and Éva Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields. Journal of the ACM (JACM), 49(5):616–639, 2002. doi:10.1145/585265.585268.
  • [31] Josh Brown Kramer, Jonathan Cutler, and AJ Radcliffe. Negative dependence and srinivasan’s sampling process. Combinatorics, Probability and Computing, 20(3):347–361, 2011. doi:10.1017/S0963548311000095.
  • [32] Hongyang Liu and Yitong Yin. Simple parallel algorithms for single-site dynamics. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1431–1444, 2022. doi:10.1145/3519935.3519999.
  • [33] Vahab Mirrokni, Michael Schapira, and Jan Vondrák. Tight information-theoretic lower bounds for welfare maximization in combinatorial auctions. In Proceedings of the 9th ACM conference on Electronic commerce, pages 70–77, 2008. doi:10.1145/1386790.1386805.
  • [34] George L Nemhauser and Laurence A Wolsey. Best algorithms for approximating the maximum of a submodular set function. Mathematics of Operations Research (Math of OR), 3(3):177–188, 1978. doi:10.1287/MOOR.3.3.177.
  • [35] Frederick Qiu and Sahil Singla. Submodular dominance and applications. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM), 2022.
  • [36] Ronald L Rivest. Symmetric encryption via keyrings and ecc. In Northernmost Crypto Workshop, Longyearbyen, Norway, Midnight Lect, 2016.
  • [37] Ankit Sharma and Jan Vondrák. Multiway cut, pairwise realizable distributions, and descending thresholds. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 724–733, 2014. doi:10.1145/2591796.2591866.
  • [38] Aravind Srinivasan. Distributions on level-sets with applications to approximation algorithms. In Proceedings of the 42nd Symposium on Foundations of Computer Science (FOCS), pages 588–597, 2001. doi:10.1109/SFCS.2001.959935.