Abstract 1 Introduction 2 Preliminaries 3 Submodular Chernoff Bounds References Appendix A Concentration of Read-𝒌 Families

Concentration of Submodular Functions and Read-k Families Under Negative Dependence

Sharmila Duppala ORCID University of Maryland, College Park, MD, USA George Z. Li ORCID Carnegie Mellon University, Pittsburgh, PA, USA Juan Luque University of Maryland, College Park, MD, USA Aravind Srinivasan ORCID University of Maryland, College Park, MD, USA Renata Valieva ORCID University of Maryland, College Park, MD, USA
Abstract

We study the question of whether submodular functions of random variables satisfying various notions of negative dependence satisfy Chernoff-like concentration inequalities. We prove such a concentration inequality for the lower tail when the random variables satisfy negative association or negative regression, partially resolving an open problem raised in ([23]). Previous work showed such concentration results for random variables that come from specific dependent-rounding algorithms ([6, 14]). We discuss some applications of our results to combinatorial optimization and beyond. We also show applications to the concentration of read-k families [13] under certain forms of negative dependence; we further show a simplified proof of the entropy-method approach of [13].

Keywords and phrases:
Chernoff bounds, Submodular Functions, Negative Correlation
Copyright and License:
[Uncaptioned image] © Sharmila Duppala, George Z. Li, Juan Luque, Aravind Srinivasan, and Renata Valieva; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Probabilistic algorithms
Acknowledgements:
We thank the ITCS 2025 reviewers for their thoughtful comments.
Funding:
All authors are supported in part by NSF award number CCF-1918749.
Editors:
Raghu Meka

1 Introduction

Concentration inequalities are ubiquitous in discrete mathematics and theoretical computer science [2, 9]. The most canonical examples are the Chernoff-Hoeffding bounds, which show strong concentration for linear combinations of independent random variables [7, 15]. In some applications, the condition of independence is too restrictive, so weaker notions have been considered [3, 24, 25]. Of interest to us is the setting where the random variables are negatively correlated, which arises naturally, for example, in designing approximation algorithms by solving a linear or semidefinite program and applying some dependent randomized rounding algorithm [11]. For this setting, [20] showed that the Chernoff-Hoeffding bounds can be shown under the weak notion of negative cylinder dependence: this and other standard notions of negative dependence are defined in Section 2.1.

For some applications in combinatorial optimization, algorithmic game theory, and machine learning, one needs to consider the more general class of submodular functions f of the random variables, rather than simple linear combinations. When the binary random variables X1,,Xn are independent, it was shown that f(X1,,Xn) still satisfies Chernoff bounds exactly [6]. When there is dependence between the random variables, the results are much weaker. The only known results are for random variables that are output by specific dependent-rounding algorithms, known as swap rounding and pipage rounding [6, 14]. These results showed that a Chernoff-like lower-tail bound also holds for submodular functions for their specific dependent rounding procedure. As noted in the work of [12], it is not clear how to generalize either of these proofs to any general notion of negative dependence.

We introduce a new notion of negative dependence, called 1-negative association, which is weaker than negative association and negative regression but stronger than negative cylinder dependence.

Definition 1.

A collection of random variables X1,,Xn is said to satisfy 1-negative association if for any two monotone functions f and g, where g depends on a single random variable Xi and f depends on the remaining random variables {Xj}j[n]\{i}, we have 𝔼[fg]𝔼[f]𝔼[g].

Importantly, while in general it is weaker than the notion of weak negative regression introduced by [23], 1-negative association is equivalent to it when the variables X1,,Xn are binary. Further details are provided in Section 3.1.

Our main result is that the Chernoff-like bound shown in [6, 14] also hold under 1-negative association (see Section 3.2). In particular, this implies the following:

Theorem 2.

Let X1,,Xn be binary random variables with mean x1,,xn satisfying negative association (or negative regression). Let f be a non-negative monotone submodular function with marginal values in [0,1] and let F be the multilinear extension of f. If we let μ0=F(x1,,xn), then we have the following:

Pr[f(X1,,Xn)(1δ)μ0]exp(μ0δ2/2).

A few remarks are in order. First, we highlight that the concentration in the above theorem is with respect to the value of the multilinear extension F(x1,,xn), rather than the true expected value 𝔼[f(X1,,Xn)]. In general, the true expected value can be greater than the value of multilinear extension [23]. Nevertheless, this suffices for applications relating to submodular maximization, and is the same type of concentration result shown in previous work. Second, recall that negative cylinder dependence does not suffice to show this concentration bound [6, p. 583]. As a result, our results are, in some informal sense, almost tight in terms of the condition on negative dependence.

In addition to providing submodular concentration results for a wide class of rounding algorithms and distributions, our results also give a new path toward understanding why pipage rounding and swap rounding satisfy the lower-tail Chernoff bound. By proving that the rounding algorithms output random variables which are 1-negatively associated, we immediately obtain a new proof of the lower tail bounds. This can be viewed as evidence that the two rounding algorithms satisfy 1-negative association or even negative association/regression. We leave this as an interesting open question.

Techniques

We use the standard method of bounding the exponential moments for lower-tail Chernoff bounds. Our idea is to show that the exponential moments for our negatively-correlated random variables is upper bounded by the exponential moments for independent copies of the random variables. Formally, let X1,,Xn be random variables satisfying 1-negative association and let X1,,Xn be independent copies of the random variables. We show for any λ<0, we have

𝔼[exp(λf(X1,,Xn))]𝔼[exp(λf(X1,,Xn))].

Since the exponential-moments method has been used to prove Chernoff bounds for submodular functions in the independent case [6], we can then repeat their proof and conclude with our desired result. We believe this proof idea may be of independent interest. For example, the same ideas can show that for a supermodular function g and any λ>0, we have

𝔼[exp(λg(X1,,Xn))]𝔼[exp(λg(X1,,Xn))].

In other words, we have morally proven the following statement: any upper-tail concentration bound which can be proven for a supermodular function g under independence based on the exponential-moments method also holds when the underlying random variables are negatively associated. As an example, we can apply this to a read-k family of supermodular functions g1,,gr for negatively associated random variables [13]. A read-k family is defined as a set of functions where each variable appears in at most k functions. This concept is particularly useful in scenarios where functions need to model or manage overlapping sets of variables with constraints on their interaction. We highlight that the proof of concentration for read-k families given in [13] doesn’t use the exponential-moments method, but instead it is based on the entropy method. We address this by giving a simpler proof of their results, this time using the exponential moments method. This gives the first concentration results for a class of supermodular functions under negatively correlated random variables, and is detailed in Section 3.3.

Applications

Our motivation for studying the problem comes from the randomized-rounding paradigm in approximation algorithms for converting a fractional solution to a linear program into an integral one. In many such randomized-rounding schemes, the output random variables have been shown to satisfy strong negative dependence properties, such as negative association [26, 11]. For all such rounding algorithms, our results immediately imply the submodular Chernoff lower-tail bound. It remains an interesting open question to efficiently sample negatively dependent distributions for a wider class of set systems. A particularly interesting algorithm is given in the work of [22]; they show that a fractional point in a matroid polytope can be rounded to an integral one such that the resulting distribution preserves marginals and satisfies negative association. However, a gap identified in their proof [23] complicates the application of their approach. The implications of this issue for the applicability of our results remain an area for further investigation.

As a concrete application, we consider the maximum coverage problem under group fairness constraints. Here, we have a universe of elements {1,,n}, a collection S1,,Sm of subsets of the universe, and a budget k. We are further given subsets C1,,C[n] (which should be thought of as demographic groups) along with thresholds w1,,w. Our goal is to choose k sets from the collection to maximize the number of elements covered subject to the fairness constraint each demographic group is sufficiently covered (i.e., at least wj elements from Cj are covered). Since this is a special case of multiobjective submodular maximization, there exists a (11/eϵ)-approximation to the problem such that each fairness constraint is approximately satisfied [6, 28]. Unfortunately, these results rely on the randomized swap-rounding algorithm due to its submodular concentration properties, which requires a super-linear time complexity. While swap rounding can be implemented with poly-logarithmic depth [5], a simpler dependent-rounding algorithm of [26] requires linear work and only O(logn) depth, which improves the efficiency. Observe that the pre-processing step in [28] only requires O(n) time. Since we can solve the linear program for fair maximum coverage in near-linear time [1], we obtain a near-linear time algorithm for the problem after using the efficient rounding algorithm of [26]. These same ideas can be used to improve the time complexity of the algorithm by [27] for influence maximization with group-fairness constraints. Since the proofs are similar to previous work, we defer the details to a future version of the paper.

More generally, negatively-associated random variables show up naturally in many settings (see e.g., the primer by [29]). [8] studied the canonical example of balls and bins, and showed that it satisfied both negative association and negative regression. Another example satisfying the negative-association conditions are any product measure over the set of bases of a balanced matroid, as shown by [10]. A final setting where such random variables occur are random spanning trees, which have been vital in the recent improvements to approximation algorithms for the traveling salesperson problem (see, e.g., [16]). Random spanning trees are known to be strongly Rayleigh, which immediately implies that they are negatively associated. Our results may be interesting here as well.

We also observe that the online rounding scheme of [18] has the strongly Rayleigh property: we immediately get strong concentration (on the lower-tail side) for monotone submodular functions, when the inputs for the function arrive online along with their (Bernoulli) distributions as in the setup of [18].

Related Work

The concentration of negatively-dependent random variables was first formally studied by [19], which showed a central limit theorem for a certain notion of negative dependence. Later on, [20] showed that cylinder negatively dependent random variables yield the Chernoff-Hoeffding concentration inequalities, just like independent random variables. In the context of our paper, these results are somewhat specialized since they focus on linear combinations of random variables.

For non-linear functions of the random variables, the majority of work has focused on the concentration of Lipschitz functions under various notions of negative dependence. [21] showed that for strong Rayleigh measures, one has Gaussian concentration for any Lipschitz function. Later on, [12] corrected an earlier proof of [8], showing that McDiarmid-like concentration results hold for Lipschitz functions of random variables satisfying negative regression. These results are complementary to ours since we are trying to give dimension-free concentration results.

2 Preliminaries

2.1 Notions of Negative Dependence

We begin by defining the notion of negative dependence commonly found in the literature.

Negative Cylinder Dependence

A collection of Boolean random variables X1,,Xn is said to be negative cylinder dependent if for every S[n],

𝔼[iSXi]iS𝔼[Xi]

and

𝔼[iS(1Xi)]iS𝔼[1Xi].

Negative cylinder dependence is the weaker notion considered here. It is known to imply Chernoff bounds for linear combinations of X1,,Xn but it is insufficient to show our submodular concentration results.

Negative Association

A collection of random variables X1,,Xn is said to be negatively associated if for any I,J[n],IJ= and any pair of non-decreasing functions f:I,g:J,

𝔼[f(XI)g(XJ)]𝔼[f(XI)]𝔼[g(XJ)].

Here and in the following, XS refers to those random variables that are indexed by the elements in S, XS={Xi:iS}. Negative association is a significant strengthening of negative cylinder dependence, and has many additional useful closure properties. This will be one of the focuses of the paper.

Negative Regression

A collection of random variables X1,,Xn is said to satisfy negative regression, if for any I,J[n],IJ=, any non-decreasing function f:I and abJ,

𝔼[f(XI)XJ=a]𝔼[f(XI)XJ=b].

Negative regression is a strengthening of negative cylinder dependence, but its relationship with negative association is not yet well understood. It is known that negative association doesn’t imply negative regression [8], but the opposite implication is not known. This will be the other focus of the paper.

Strong Rayleigh

A collection of random variables X1,,Xn is said to satisfy the strong Rayleigh property if the generating function

F(z1,,zn)=𝔼[j=1nzjXj]

is a real stable polynomial (i.e., it has no root (z1,,zn)n with all positive imaginary components). The strong Rayleigh property is the strongest notion of negative dependence, and has been shown to imply all other studied negative dependence definitions [4]. As a result, all of our results apply here as well.

2.2 Submodular Functions

We also give a quick review of the basics of submodular functions.

Submodular Functions

We say that a function f:{0,1}n is submodular if

f(X1,,Xi1,1,Xi+1,Xn)f(X1,,Xi1,0,Xi+1,Xn)

is a non-increasing function of X1,,Xi1,Xi+1,,Xn for each i[n]. When viewing the binary input of f as the indicator vector for a set, this is equivalent to the more common definition that f is submodular if for any X,Y[n] with XY and any xY, we have

f(X{x})f(X)f(Y{x})f(Y).

Supermodular Functions

We say that a function g:{0,1}n is supermodular if

g(X1,,Xi1,1,Xi+1,Xn)g(X1,,Xi1,0,Xi+1,Xn)

is a non-decreasing function of X1,,Xi1,Xi+1,,Xn for each i[n]. When viewing the binary input of g as the indicator vector for a set, this is equivalent to the more common definition that g is supermodular if for any X,Y[n] with XY and any xY, we have

g(X{x})g(X)g(Y{x})g(Y).

Mutlilinear Extension

The multilinear extension of a function f is

F(x)=𝔼[f(x)]=SNf(S)iSxiiS(1xi),

for x[0,1]n. If we view x as a probability vector, the multilinear extension F is simply the expected value of f when each coordinate is rounded independently in {0,1}.

3 Submodular Chernoff Bounds

3.1 1-Negative Association and Weak Negative Regression

We first define the weaker notion of negative dependence which we work with, called 1-negative association, and prove some simple properties about it. We also define a related notion of weak negative regression, which is the analogue of 1-negative association for the notion of negative regression, and we show the equivalence between the two for binary random variables and show that weak negative regression is strictly stronger in general. After an initial draft, we discovered that [23] had already introduced the notion of weak negative regression for binary random variables in a context complementary to ours. Using their work, we can immediately show nice properties about 1-negative association.

Definition 3.

A collection of random variables X1,,Xn is said to satisfy 1-negative association if for any two monotone functions f and g, where g depends on a single random variable Xi and f depends on the remaining random variables {Xj}j[n]\{i}, we have 𝔼[fg]𝔼[f]𝔼[g].

Definition 4.

A collection of random variables X1,,Xn is said to satisfy weak negative regression if for any index i and any monotone function f depending on the remaining random variables {Xj}j[n]\{i}, we have 𝔼[f|Xi=b]𝔼[f|Xi=a] for all ab.

In the following lemmata, we show that weak negative regression implies 1-negative association in general. We then show that the reverse implication holds for binary random variables, but give an example showing that it does not hold in general.

Claim 5.

If a collection of random variables X1,,Xn satisfies weak negative regression, then it satisfies 1-negative association.

Proof.

Assume X1,,Xn satisfy weak negative regression; we will prove that it also satisfies 1-negative association. Let f and g be monotone functions such that f depends on XI for some subset I[n] and g depends on Xi for iI. Without loss of generality, let us assume that f and g are non-decreasing.

First, define the function F(a)=𝔼[f(X)|Xi=a]. Evaluating the expectation 𝔼[f(XI)g(Xi)], we can express it via the law of total expectation as

𝔼[f(XI)g(Xi)]=𝔼[𝔼[f(XI)|Xi]g(Xi)]=𝔼[F(Xi)g(Xi)].

Next, we observe that by the definition of weak negative regression, F(a) is non-increasing. Therefore, random variables F(Xi) and g(Xi) are negatively correlated, yielding

𝔼[F(Xi)g(Xi)]𝔼[F(Xi)]𝔼[g(Xi)].

Converting 𝔼[F(Xi)] back into the terms of f and g, we find

𝔼[F(Xi)]𝔼[g(Xi)]=𝔼[f(X)]𝔼[g(Xi)].

The overall inequality 𝔼[f(XI)g(Xi)]𝔼[f(XI)]𝔼[g(Xi)] thus holds, which establishes that X1,,Xn are 1-negatively associated, completing the proof.

Claim 6.

If a collection of binary random variables X1,,Xn satisfies 1-negative association, then it satisfies weak negative regression.

Proof.

Recall that we wish to prove that for any non-decreasing function f depending on some subset I[n] and any iI, we have that

𝔼[f(XI)|Xi=0]𝔼[f(XI)|Xi=1].

By the definition of 1-negative association, we obtain that for any monotone functions g which depends on Xi, the following inequality holds:

𝔼[f(XI)g(Xi)]𝔼[f(XI)]𝔼[g(Xi)]. (1)

Without loss of generality, we may assume that 𝔼[f(XI)|Xi=1]=0 by shifting f by a constant. By choosing g to be the identity function, we can apply the law of total probability to obtain

𝔼[f(XI)g(Xi)]=Pr[Xi=0]𝔼[f(XI)|Xi=0]g(0)+Pr[Xi=1]𝔼[f(XI)|Xi=1]g(1)=0.

Plugging this into Equation 1, we obtain

0 𝔼[f(XI)]𝔼[g(Xi)]=𝔼[f(XI)|Xi=0]Pr[Xi=0]𝔼[Xi],

again by the law of total probability. Since 𝔼[Xi]>0 and Pr[Xi=0]>0, this implies that

𝔼[f(XI)|Xi=0]0,

which concludes the proof since 𝔼[f(XI)|Xi=1]=0.

Claim 7.

There exists (non-binary) distributions over 2 random variables which satisfy 1-negative association but not weak negative regression. In other words, 1-negative association is strictly more general than weak negative regression for non-binary random variables.

Proof.

Let’s first discuss the intuition for the construction of the counterexample. One can show via algebra that 𝔼[f(XI)g(Xi)]𝔼[f(XI)]𝔼[g(Xi)] can be expanded as the following expression:

x<yPr[Xi=x]Pr[Xi=y](𝔼[f(XI)|Xi=x]𝔼[f(XI)|Xi=y])(g(x)g(y)),

where the summation is over the values that Xi takes with non-zero probability.

Now, suppose that for some values x<y we have that

𝔼[f(X1,,Xn)|Xi=x]𝔼[f(X1,,Xn)|Xi=y]<0,

violating the weak negative regression property. This would imply that some of the summands are positive (since g(x)<g(y) by monotonicity). In the case of binary random variables, the summation would only consist of a single summand so 1-negative association would be violated. For general random variables, the summation consists of multiple terms so the summation may still be negative even when a single summand is positive. Consequently, the random variables may still satisfy 1-negative association.

We now give the example. Consider the random variables (X1,X2) which are uniformly distributed on their support set {(0,3),(1,1),(2,2),(3,0)}. By considering an identity function 𝟙x:{0,1,2,3}{0,1,2,3}, we can show that that the distribution of (X1,X2) does not satisfy weak negative regression:

𝔼[𝟙x(X2)|X1=1]=𝟙x(1)=1<2=𝟙x(2)=𝔼[𝟙x(X2)|X1=2].

However, for any pair of non-decreasing functions f,g:{0,1,2,3}, we have

𝔼[f(X1)]𝔼[g(X2)]𝔼[f(X1)g(X2)]=f(1)+f(2)+f(3)4g(1)+g(2)+g(3)4f(1)g(1)+f(2)g(2)4,

where we have again assumed without loss of generality that f(0)=g(0)=0.

We claim that the quantity on the right hand side is always non-negative. In order to see this, observe that f(2)g(2)f(i)g(j) for any i,j2 by monotonicity. As a result, we have

f(2)g(2)4(f(2)+f(3))(g(2)+g(3))16.

Further, we observe that f(1)g(1)f(i)g(j) for any i,j1 by monotonicity. As a result, we have

f(1)g(1)4f(1)(g(2)+g(3))+g(1)(f(2)+f(3))16.

Combining these two inequalities immediately and observing that f(1)g(1)0 by monotonicity implies our desired result. Hence, the distribution is 1-negatively associated.

Since 1-negative association and weak negative regression are equivalent for binary random variables and weak negative regression has been shown to be strictly stronger than cylinder negative dependence [23, Proposition 2.4], we also have that 1-negative association is strictly stronger than cylinder negative dependence. Additionally, since weak negative regression is strictly stronger than 1-negative association for general random variables and weak negative regression has been shown to be strictly weaker than negative association and negative regression [23, Proposition 2.4], we have that 1-negative association is strictly weaker than negative association and negative regression. We summarize these in the following corollaries.

Corollary 8.

1-negative association is a strictly weaker condition than negative association.

Corollary 9.

1-negative association is a strictly weaker condition than negative regression.

Corollary 10.

1-negative association is a strictly stronger condition than negative cylinder dependence.

3.2 Proof of Submodular Concentration

We will now prove our main result. As mentioned in the introduction, our proof is based on the standard technique of bounding the exponential moments. The following lemma contains our main technical contribution, stating that the exponential moments of f(X1,,Xn) under 1-negative association is dominated by that under independence. Our results will follow easily afterwards.

Lemma 11.

Let X1,,Xn be 1-negatively associated random variables and let X1,,Xn be independent random variables with the same marginal distributions. Also let f be a non-negative monotone function.

  • If f is submodular and λ<0, we have
    𝔼[exp(λf(X1,,Xn))]𝔼[exp(λf(X1,,Xn))].

  • If f is supermodular and λ>0, we have
    𝔼[exp(λf(X1,,Xn))]𝔼[exp(λf(X1,,Xn))].

Proof.

Fix λ<0 if f is submodular and λ>0 if f is supermodular. Observe that in order to prove the lemma, it suffices to prove

𝔼[exp(λf(X1,,Xi,,Xm))]𝔼[exp(λf(X1,,Xi,,Xm))], (2)

since we can iteratively apply the above inequality to each Xi (note that we can do this because independent variables are also negatively associated). For simplicity of notation and without loss of generality, we will prove the inequality for i=1.

By considering the cases of X1=0 and X1=1 separately, we have

exp(λf(X1,,Xn)) =X1exp(λf(1,X2,,Xn)))+(1X1)exp(λf(0,X2,,Xn))),

where the equality holds pointwise on the underlying probability space. Via simple algebraic manipulations, we can rewrite the right hand side as

X1[exp(λf(1,X2,,Xn)))exp(λf(0,X2,,Xn))]+exp(λf(0,X2,,Xn))

Taking expectations, we now have that 𝔼[exp(λf(X1,,Xn))] can be written as

𝔼[X1[exp(λf(1,X2,,Xn)))exp(λf(0,X2,,Xn))]]+𝔼[exp(λf(0,X2,,Xn))] (3)

Observe that X1 is clearly an increasing function of X1. We claim that if either (i) f is submodular and λ<0 or (ii) f is supermodular and λ>0, we have that exp(λf(1,X2,,Xn)))exp(λf(0,X2,,Xn)) is an increasing function in X2,,Xn. Indeed, we first rewrite the function as

exp(λf(0,X2,,Xn))[exp(λ(f(1,X2,,Xn)f(0,X2,,Xn)))1]A1A2

for simplicity of notation.

Let us first consider the case when λ<0 and f is submodular. We have that A1 is (i) positive because the exponential function is always positive and (ii) non-increasing in X2,,Xn because f is non-decreasing and λ<0. We also have that A2 is (i) negative because the argument in exp() is negative, so the exponential is in (0,1) (ii) non-decreasing since λ<0 and the difference of f evaluated at X1=1 and X1=0 is non-increasing by definition of submodularity. Hence, our expression of interest is the product of a function A1 which decreases towards 0 and a function A2 which increases towards 0. The product will be negative and monotonically increasing towards 0.

Now, let us consider the case when λ>0 and f is supermodular. We have that A1 is (i) positive because the exponential function is always positive and (ii) non-decreasing since λ>0, f is monotone, and exp() is also monotone. We also have that A2 is (i) positive because the argument of exp() is positive since f is monotone so the exponential is greater than 1 and (ii) non-decreasing since λ>0 and the difference of f evaluated at X1=1 and X1=0 is non-decreasing by definition of supermodularity. As a result, the product will be positive and non-decreasing, as desired.

Since we have shown that the A1A2 is also monotone, we now have that the first term in Equation 3 can be written as the product of monotone functions of disjoint subsets, one of which is the singleton set. By 1-negative association, we have that the first term is upper bounded by

𝔼[X1]𝔼[exp(λf(1,X2,,Xn))exp(λf(0,X2,,Xn))].

Consequently, the entire expression in (3) is upper bounded by

𝔼[X1]𝔼[exp(λf(1,X2,,Xn))exp(λf(0,X2,,Xn))]+𝔼[exp(λf(0,X2,,Xn))].

Since X1 and X1 have the same marginal distributions, the above is exactly equal to

𝔼[X1]𝔼[exp(λf(1,X2,,Xn))exp(λf(0,X2,,Xn))]+𝔼[exp(λf(0,X2,,Xn))].

And since X1 is independent with X2,,Xm by assumption, the above is equal to

𝔼[X1exp(λf(1,X2,,Xn))exp(λf(0,X2,,Xn))]+𝔼[exp(λf(0,X2,,Xn))].

In particular, observe that this is in the exact same form as Equation 3, except with X1 replaced with X1. Note that when we transformed the left-hand side of Equation 2 to Equation 3, we never used any properties of the random variables X1,,Xn other than the fact that they take values in {0,1}. As a result, we can reverse the direction of all of the equalities to show that the above expression is equal to

𝔼[exp(λf(X1,X2,,Xn))],

which completes the proof of the lemma.

Now, we will complete the proof of our main result. Combining the theorem below with Claims 8 and 9 immediately gives a proof of Theorem 2. Here, our proof will rely heavily on the proof of the Chernoff bound for submodular functions under independence given in [6].

Theorem 12.

Let X1,,Xn be binary random variables with mean x1,,xn satisfying 1-negative association. Let f be a non-negative monotone submodular function with marginal values in [0,1] and let F be the multilinear extension of f. If we let, μ0=F(x1,,xn), then we have the following:

Pr[f(X1,,Xn)(1δ)μ0]exp(μ0δ2/2).
Proof.

Let X1,,Xn be independent random variables with the same respective marginals as X1,,Xn and let λ<0 be a parameter to be set later. Let us decompose f(X1,,Xn)=i=1nYi, where

Yi=f(X1,,Xi,0,,0)f(X1,,Xi1,0,,0).

Let us denote 𝔼[Yi]=ωi and μ0=i=1nωi=𝔼[f(X1,,Xn)]. By the convexity of the exponential and the fact that Yi[0,1], we have that

𝔼[exp(λYi)]ωiexp(λ)+(1ωi)=1+[exp(λ)1]ωiexp[(exp(λ)1)ωi].

Combining the above with Lemma C.1 from [6], we have that

𝔼[exp(λf(X1,,Xn)]=𝔼[exp(λi=1nYi)] i=1n𝔼[exp(λYi)]
exp[(exp(λ)1)μ0]. (4)

Now, we can follow the proof of the standard Chernoff bound:

Pr[f(X1,,Xn)(1δ)μ0] =Pr[exp(λf(X1,,Xn))exp(λ(1δ)μ0)]
𝔼[exp(λf(X1,,Xn))]exp(λ(1δ)μ0)
𝔼[exp(λf(X1,,Xn))]exp(λ(1δ)μ0)
exp[(exp(λ)1)μ0]exp(λ(1δ)μ0)

The first equality follows since exp(λx) is a monotone function, the first inequality follows by Markov’s inequality, the second inequality follows by Lemma 8, and the final inequality follows Equation 4.

Finally, we can choose λ such that exp(λ)=1δ, which gives

Pr[f(X1,,Xn)(1δ)μ0]exp(δμ0)(1δ)(1δ)μ0exp(μ0δ2/2),

where we used (1δ)1δexp(δ+δ2/2) for δ(0,1] in the final inequality.

3.3 Concentration of read-𝒌 families

In this subsection, we illustrate an application of our proof technique to give concentration for a read-k family of supermodular functions. Read-k families arise naturally in problems such as subgraph counting in random graphs, and can be seen as a complementary weak dependence notion to that of low-degree polynomials [17]. Our work gives the first concentration results for these problems under negative dependence.

Let’s consider this notion of weak dependence defined in [13]. Let Y1,,Yn be random variables and assume that they can be factored as functions of random variables X1,,Xm. We say that Y1,,Yn are a read-k family of X1,,Xm if for each variable Xi, there are at most k variables among Y1,,Yn that are influenced by Xi. Formally, we have the following.

Definition 13.

Let X1,,Xm be random variables. For each j[n], let Pj[m] and let fj:{0,1}Pj[0,1] be functions of XPj. We say that Yj=fj(XPj) are a read-k family if |{j:iPj}|k for each i[m] (i.e., each variable Xi influences at most k functions).

When X1,,Xm are independent, [13] showed that we have

Pr[j=1nfj(XPj)(p+ϵ)n]exp(D(p+ϵ||p)n/k) (5)
Pr[j=1nfj(XPj)(pϵ)n]exp(D(pϵ||p)n/k), (6)

where p=(1/n)j=1n𝔼[Yj] and D(||) is the Kullback-Leibler divergence. Notably, while Gavinsky et al. do not require random variables X1,,Xm to be binary, as we do in our approach, their model requires Y1,,Yn to be binary. We will show that Inequality 5 on the upper tail still holds for supermodular functions f1,,fn. Similarily, Inequality 6, which addresses the lower tail bound, continues to apply to submodular functions f1,,fn.

Theorem 14.

Let X1,,Xm be 1-negatively associated random variables and let X1,,Xm be independent random variables with the same respective marginal distributions as X1,,Xm. Suppose that fj(XPj) for j[n] are a read-k family, where fj:{0,1}Pj[0,1] are supermodular functions. If we let p0=(1/n)j=1n𝔼[fj(XPj)] denote the averaged expectation when the underlying random variables are independent, we have

Pr[j=1nfj(XPj)(p0+ϵ)n]exp(D(p0+ϵ||p0)n/k).
Proof.

Let f(X1,,Xm)=j=1nfj(XPj) be the quantity of interest, and note that f is the sum of supermodular functions so it is supermodular as well.

We will follow the standard proof via exponential moments. Let λ>0; we have

Pr[f(X1,,Xm)(p0+ϵ)n] =Pr[exp(λf(X1,,Xm))exp(λ(p0+ϵ)n)] (7)
𝔼[exp(λf(X1,,Xm))]/exp(λ(p0+ϵ)n), (8)

where the inequality follows by Markov’s. Since f is supermodular, we have by Lemma 11 that

𝔼[exp(λf(X1,,Xm))]𝔼[exp(λf(X1,,Xm))]=𝔼[exp(λj=1nfj(XPj))]. (9)

In the new proof of concentration of read-k families, given in the Appendix, we show that

𝔼[exp(λj=1nfj(XPj))](j=1n𝔼[exp(λfj(XPj))k])1/k. (10)

Combining equations 710, we have

Pr[f(X1,,Xm)(p0+ϵ)n](j=1n𝔼[exp(kλfj(XPj))/exp(kλ(p0+ϵ)n])1/k.

Let λ=kλ; since λ>0 is a parameter we set, we can view λ>0 as a parameter as well. We will abuse notation and replace λ with λ, so we have

Pr[f(X1,,Xm)(p0+ϵ)n](j=1n𝔼[exp(λfj(XPj))/exp(λ(p0+ϵ)n])1/k,

for any λ>0. Now, observe that the right hand side of the inequality is the exact same as in the proof of the standard Chernoff bound under independence, except with an additional exponent 1/k. As a result, we can follow the original proof of the Chernoff bound to show that

Pr[j=1nfj(XPj)(p0+ϵ)n]exp(D(p0+ϵ||p0)n/k),

which was our desired result.

Corollary 15.

Let X1,,Xm be 1-negatively associated random variables. Suppose that fj(XPj) for j[n] are a read-k family, where fj:{0,1}Pj[0,1] are submodular functions. If we let p0=(1/n)j=1n𝔼[fj(XPj)] denote the averaged expectation when the underlying random variables are independent, we have

Pr[j=1nfj(XPj)(p0ϵ)n]exp(D(p0ϵ||p0)n/k).
Proof.

Define gj:=1fj, where gj:{0,1}Pj[0,1] are supermodular. Then 1p0=(1/n)j=1n𝔼[gj(XPj)]. Applying Theorem 14, we have:

Pr[j=1ngj(XPj)(1(p0ϵ))n](1p0+ϵ)n]exp(D(1p0+ϵ||1p0)n/k).

The property of Kullback–Leibler divergence D(1p||1q)=D(p||q) implies

Pr[j=1nfj(XPj)(p0ϵ)n] =Pr[j=1ngj(XPj)(1(p0ϵ))n]
exp(D(p0ϵ||p0)n/k),

which concludes the proof.

References

  • [1] Zeyuan Allen-Zhu and Lorenzo Orecchia. Nearly-linear time positive lp solver with faster convergence rate. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’15, pages 229–236. Association for Computing Machinery, 2015. doi:10.1145/2746539.2746573.
  • [2] Noga Alon and Joel H. Spencer. The Probabilistic Method, Third Edition. Wiley-Interscience series in discrete mathematics and optimization. Wiley, 2008.
  • [3] Kazuoki Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 19(3):357–367, 1967. doi:10.2748/tmj/1178243286.
  • [4] Julius Borcea, Petter Brändén, and Thomas M. Liggett. Negative dependence and the geometry of polynomials. Journal of the American Mathematical Society, 22(2):521–567, 2009. URL: http://www.jstor.org/stable/40587241.
  • [5] Chandra Chekuri and Kent Quanrud. Submodular function maximization in parallel via the multilinear relaxation. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 303–322. SIAM, 2019. doi:10.1137/1.9781611975482.20.
  • [6] Chandra Chekuri, Jan Vondrák, and Rico Zenklusen. Dependent randomized rounding via exchange properties of combinatorial structures. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, pages 575–584. IEEE Computer Society, 2010. doi:10.1109/FOCS.2010.60.
  • [7] Herman Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations. The Annals of Mathematical Statistics, 23(4):493–507, 1952. doi:10.1214/aoms/1177729330.
  • [8] Devdatt Dubhashi and Desh Ranjan. Balls and bins: a study in negative dependence. Random Struct. Algorithms, 13(2):99–124, September 1998. doi:10.1002/(SICI)1098-2418(199809)13:2\%3C99::AID-RSA1\%3E3.0.CO;2-M.
  • [9] Devdatt P. Dubhashi and Alessandro Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009. URL: http://www.cambridge.org/gb/knowledge/isbn/item2327542/.
  • [10] Tomás Feder and Milena Mihail. Balanced matroids. In S. Rao Kosaraju, Mike Fellows, Avi Wigderson, and John A. Ellis, editors, Proceedings of the 24th Annual ACM Symposium on Theory of Computing, STOC 1992, pages 26–38. ACM, 1992. doi:10.1145/129712.129716.
  • [11] Rajiv Gandhi, Samir Khuller, Srinivasan Parthasarathy, and Aravind Srinivasan. Dependent rounding and its applications to approximation algorithms. J. ACM, 53(3):324–360, 2006. doi:10.1145/1147954.1147956.
  • [12] Kevin Garbe and Jan Vondrák. Concentration of lipschitz functions of negatively dependent variables, 2018. arXiv:1804.10084.
  • [13] Dmitry Gavinsky, Shachar Lovett, Michael E. Saks, and Srikanth Srinivasan. A tail bound for read-k families of functions. Random Struct. Algorithms, 47(1):99–108, 2015. doi:10.1002/rsa.20532.
  • [14] Nicholas J. A. Harvey and Neil Olver. Pipage rounding, pessimistic estimators and matrix concentration. In Chandra Chekuri, editor, Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, pages 926–945. SIAM, 2014. doi:10.1137/1.9781611973402.69.
  • [15] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963. URL: http://www.jstor.org/stable/2282952.
  • [16] Anna R. Karlin, Nathan Klein, and Shayan Oveis Gharan. A (slightly) improved approximation algorithm for metric TSP. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 32–45. ACM, 2021. doi:10.1145/3406325.3451009.
  • [17] Jeong Han Kim and Van H. Vu. Concentration of multivariate polynomials and its applications. Comb., 20(3):417–434, 2000. doi:10.1007/s004930070014.
  • [18] Joseph (Seffi) Naor, Aravind Srinivasan, and David Wajc. Online dependent rounding schemes. arXiv preprint, 2024. arXiv:2301.08680.
  • [19] Charles M. Newman. Asymptotic independence and limit theorems for positively and negatively dependent random variables. Lecture Notes-Monograph Series, 5:127–140, 1984. URL: http://www.jstor.org/stable/4355492.
  • [20] Alessandro Panconesi and Aravind Srinivasan. Randomized distributed edge coloring via an extension of the chernoff-hoeffding bounds. SIAM J. Comput., 26(2):350–368, 1997. doi:10.1137/S0097539793250767.
  • [21] Robin Pemantle and Yuval Peres. Concentration of lipschitz functionals of determinantal and other strong rayleigh measures. Comb. Probab. Comput., 23(1):140–160, 2014. doi:10.1017/S0963548313000345.
  • [22] Yuval Peres, Mohit Singh, and Nisheeth K. Vishnoi. Random walks in polytopes and negative dependence. In Christos H. Papadimitriou, editor, 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA, volume 67 of LIPIcs, pages 50:1–50:10. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPIcs.ITCS.2017.50.
  • [23] Frederick Qiu and Sahil Singla. Submodular dominance and applications. In Amit Chakrabarti and Chaitanya Swamy, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2022, volume 245 of LIPIcs, pages 44:1–44:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.APPROX/RANDOM.2022.44.
  • [24] Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discret. Math., 8(2):223–250, 1995. doi:10.1137/S089548019223872X.
  • [25] Maciej Skorski. Tight chernoff-like bounds under limited independence. In Amit Chakrabarti and Chaitanya Swamy, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2022, volume 245 of LIPIcs, pages 15:1–15:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.APPROX/RANDOM.2022.15.
  • [26] Aravind Srinivasan. Distributions on level-sets with applications to approximation algorithms. In 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, pages 588–597. IEEE Computer Society, 2001. doi:10.1109/SFCS.2001.959935.
  • [27] Alan Tsang, Bryan Wilder, Eric Rice, Milind Tambe, and Yair Zick. Group-fairness in influence maximization. In Sarit Kraus, editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, pages 5997–6005. ijcai.org, 2019. doi:10.24963/ijcai.2019/831.
  • [28] Rajan Udwani. Multi-objective maximization of monotone submodular functions with cardinality constraint. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Annual Conference on Neural Information Processing Systems, NeurIPS 2018, pages 9513–9524, 2018. URL: https://proceedings.neurips.cc/paper/2018/hash/7e448ed9dd44e6e22442dac8e21856ae-Abstract.html.
  • [29] David Wajc. Negative association: Definition, properties, and applications. Manuscript, 2017. URL: https://www.cs.cmu.edu/∼dwajc/notes/Negative%20Association.pdf.

Appendix A Concentration of Read-𝒌 Families

We will give a new simpler proof of the results of Gavinsky et al. [13] using exponential moments for fj for independent random variables X1,,Xn. Our proof will use the following lemma

Lemma 16.

For an arbitrary read-k family F1,,Fn, we have

𝔼[j=1nFj](j=1n𝔼[Fjk])1/k. (11)

Using this lemma, if we define

Fj=exp(λfj(X1,,Xm)),

it is easy to see that F1,,Fn are a read-k family since f1,,fn are read-k. We will show later that after applying inequality (11) on F1,,Fm, we can adapt the standard Chernoff bound proof to our case.

Proof.

We will prove inequality (11) via induction on the number of independent variables m. For the base case m=1, observe that there are at most k non-constant functions Fj by definition. If there are fewer than k functions Fj, we can also add identity functions without changing the product. As a result, we can without loss of generality assume that there are exactly k functions Fj. The inequality then follows directly by the Generalized Hölder’s Inequality.

Now assume we have proven the statement for m independent variables; we will try to prove it for m+1. Again, let S=S1 denote the set of functions Fj which are influenced by X1. We have

𝔼[j=1nFj] =𝔼[jSFjjSFj]
=𝔼[𝔼[jSFjjSFj|X2,,Xm+1]]
=𝔼[𝔼[jSFj|X2,,Xm+1]jSFj] (12)

Here, the first equality is obvious, the second equality follows by the law of total expectation, and the third equality follows since Fj for jS only depends on X2,,Xm+1 and is independent of X1.

After taking the conditional expectation, observe that 𝔼[jSFj|X2,,Xm+1] is a random variable which only depends on X2,,Xm+1. In particular, these form a read-k family over m random variables, so we can apply the inductive hypothesis to claim that

𝔼[jSFj|X2,,Xm+1](jS𝔼[Fjk|X2,,Xm+1])1/k

After combining this with equation (12), we have

𝔼[j=1nFj]𝔼[jS𝔼[Fjk|X2,,Xm+1]1/kjSFj]A

For jS, let Gj=Fj and for jS, define

Gj=𝔼[Fjk|X2,,Xm+1]1/k

so that A=𝔼[j=1nGj]. Observe that Gj are again a read-k family on X2,,Xm+1, so we can again apply the induction hypothesis to claim that

A=𝔼[j=1nGj] j=1n𝔼[Gjk]1/k=j=1n𝔼[Fjk]1/k,

where the final equality follows directly by definition of Gj. This completes the proof.

Using the lemma, we can follow the standard Chernoff bound techniques to complete the proof. Let X=j=1nfj(X1,,Xm); we can apply Markov’s inequality for any λ<0 to obtain

Pr[Xqn]=Pr[exp(λX)exp(λqn)]exp(λqn)𝔼[exp(λX)]

As mentioned before, we can take Fj=exp(λfj(X1,,Xm)) and apply inequality (11) to obtain that

exp(λX)=𝔼[j=1nFj]j=1n𝔼[Fjk]1/k.

Combining this inequality with the previous inequality, we now have that

Pr[Xqn]exp(λqn)j=1n𝔼[Fjk]1/k=(j=1n𝔼[[Fj/exp(λq)]k])1/k.

Writing out the definition of Fj, we have

[Fj/exp(λq)]k=exp(λkfj(X1,,Xm))/exp(λkq).

Define λ=λk; since λ was arbitrary, we will abuse notation and let λ=λ. Combining the two previous inequalities, we have that

Pr[Xqn](j=1n𝔼[exp(λfj)/exp(λq)])1/k.

Here, the right-hand side is in exactly the same form as in the proof of the Chernoff-Hoeffding theorem under independence, except with an additional exponent 1/k. As a result, we can follow the Chernoff-Hoeffding proof and take q=pϵ to obtain

Pr[X(pϵ)n]exp(D(pϵ||p)n/k),

which was our desired result.