Abstract 1 Introduction 2 Approximate Kernels for Max-Cut with Cardinality Constraints 3 Single Constraint 4 Constant Number of Constraints 5 Arbitrary Number of Constraints References Appendix A Basics of the Lasserre SDP Hierarchy Appendix B Omitted proofs

Max-Cut with Multiple Cardinality Constraints

Yury Makarychev ORCID Toyota Technological Institute at Chicago, IL, USA Madhusudhan Reddy Pittu ORCID Carnegie Mellon University, Pittsburgh, PA, USA Ali Vakilian ORCID Toyota Technological Institute at Chicago, IL, USA
Abstract

We study the classic Max-Cut problem under multiple cardinality constraints, which we refer to as the Constrained Max-Cut problem. Given a graph G=(V,E), a partition of the vertices into c disjoint parts V1,,Vc, and cardinality parameters k1,,kc, the goal is to select a set SV such that |SVi|=ki for each i[c], maximizing the total weight of edges crossing S (i.e., edges with exactly one endpoint in S).

By designing an approximate kernel for Constrained Max-Cut and building on the correlation rounding technique of Raghavendra and Tan (2012), we present a (0.858ε)-approximation algorithm for the problem when c=O(1). The algorithm runs in time O(min{k/ε,n}poly(c/ε)+poly(n)), where k=i[c]ki and n=|V|. This improves upon the (12+ε0)-approximation of Feige and Langberg (2001) for MaxCutk (the special case when c=1,k1=k), and generalizes the (0.858ε)-approximation of Raghavendra and Tan (2012), which only applies when min{k,nk}=Ω(n) and does not handle multiple constraints.

We also establish that, for general values of c, it is NP-hard to determine whether a feasible solution exists that cuts all edges. Finally, we present a 1/2-approximation algorithm for Max-Cut under an arbitrary matroid constraint.

Keywords and phrases:
Maxcut, Semi-definite Programming, Sum of Squares Hierarchy
Category:
APPROX
Copyright and License:
[Uncaptioned image] © Yury Makarychev, Madhusudhan Reddy Pittu, and Ali Vakilian; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms
Acknowledgements:
This work was conducted in part while Madhusudhan Reddy Pittu was a visiting student at TTIC.
Related Version:
Full Version: https://arxiv.org/abs/2507.12607
Editors:
Alina Ene and Eshan Chattopadhyay

1 Introduction

Given an undirected graph G=(V,E) on n vertices and a weight function w:E+, the Max-Cut problem seeks a subset SV maximizing δw(S)=uS,vVSw({u,v}), the total weight of edges crossing the cut (S,VS). Without loss of generality, we assume the weights are scaled so that the total edge weight satisfies eEw(e)=1.

Max-Cut is a fundamental problem in combinatorial optimization and approximation algorithms, with several landmark results, most notably the seminal SDP rounding algorithm by Goemans and Williamson [15], which achieves an α𝖦𝖶0.878 approximation. This approximation ratio is known to be optimal under the Unique Games Conjecture (UGC) [18].

In this work, we study a variant called Constrained Max-Cut, where additional partition constraints are imposed on the solution.

Definition 1 (Constrained Max-Cut).

Given a graph G=(V,E), a weight function w:E+, and a set of c partition constraints {(Vi,ki)}i[c] where V=i[c]Vi and ki|Vi|/2 for all i, the Constrained Max-Cut problem asks to find a subset SV such that |SVi|=ki for all i[c], maximizing δw(S).

Several well-studied problems are special cases of ConstrainedMaxCut. The Max-Bisection problem corresponds to c=1 and k=n/2, and admits approximation factors close to α𝖦𝖶 – specifically, 0.8776 [2] (see also [14, 25, 9, 17, 11, 16, 22]). More generally, when there is a single cardinality constraint |S|=k (i.e., c=1), the problem is known as MaxCutk [10]. It is also referred to as (k,nk)-Max-Cut in parameterized complexity, see e.g., [6, 24].

Definition 2 (Max-Cutk).

Given an undirected graph G=(V,E), a weight function w:E+, and an integer k111Assume that kn/2 without loss of generality., the MaxCutk problem seeks a subset SV of cardinality exactly k that maximizes δw(S).

For k=Ω(n), the global correlation rounding technique of Raghavendra and Tan [22] (building on [4]) achieves an αcc0.858 approximation. Austrin and Stankovic [3] later showed that this approximation is essentially tight for k<0.365n. However, when k=o(n), existing results are weaker. Feige and Langberg [10] gave a (0.5+ε0)-approximation for all k, where ε0 is a small universal constant (ε0<0.09). Moreover, the pipage rounding technique of Ageev and Sviridenko [1] guarantees a 0.5-approximation for all k.

The special case of ConstrainedMaxCut with c=1 and k=o(n) has applications in pricing in social networks [8], also referred to as influence-and-exploit [13]. In this context, consumers’s valuation depends directly on the usage of their neighbors in the network. Consequently, the seller’s optimal pricing strategy may involve offering discounts to certain influencers who hold central positions in the underlying network. Candogan et al. [8] considered a setting with two price types: full and discounted. Specifically, the objective is to maximize the total network influence, subject to the constraint of offering k discounted prices to a small target set of buyers, where kn. Candogan et al. showed that the obtained problem can be reduced to MaxCutk, where k=o(n).

Moreover, in certain settings where diversity among influencers is desirable, it is natural to require that the selected influencers fairly represent different groups. This requirement can be modeled as a ConstrainedMaxCut problem with multiple capacity constraints. In most relevant cases, the number of groups is a small integer c=O(1). For a comprehensive survey on fair representation for learning on graphs, see to [19].

In this paper, we also introduce and study a more general version of the problem: finding a maximum cut subject to a matroid constraint.

Definition 3 (Matroid Max-Cut).

Given an undirected graph G=(V,E), a weight function w:E+, and a matroid =(V,), the matroid Max-Cut problem is to maximize δw(S) subject to S, where is the collection of bases of .

Note that ConstrainedMaxCut and MaxCutk are special cases of the matroid Max-Cut problem; we get these problems when the matroid is a partition matroid or a uniform matroid, respectively. This problem has not been explicitly studied previously. However, an algorithm by Lee, Mirrokni, Nagarajan, and Sviridenko [21] gives a (13ε)-approximation to the more general problem of symmetric submodular function maximization subject to matroid base constraint.

1.1 Our Results and Techniques

Our algorithm for MaxCutk builds on the global correlation rounding technique introduced by Raghavendra and Tan [22], which achieves an αcc0.858-approximation in the regime where k=Ω(n). We extend this approach by developing an approximate kernel and applying it in conjunction with correlation rounding, allowing us to handle the challenging case where k=o(n).

This yields an approximation guarantee of (0.858ε) for MaxCutk across all values of k, improving upon the (0.5+ε0)-approximation of [10] in the sparse regime. Formally:

Theorem 4.

For every ε>0, there is an algorithm that runs in O(min{k/ε,n}poly(1/ε)+poly(n)) time and obtains an (αccε)-approximation to the optimal MaxCutk solution, where αcc0.858 is the approximation factor of the Raghavendra–Tan algorithm.

The regime where k=o(n) is particularly challenging, as the correlation rounding approach of Raghavendra and Tan [22] does not extend to this setting. Our algorithm closes this gap by improving upon the (0.5+ε0)-approximation of [10] in the sparse regime. For completeness, we provide a brief overview of the approach of Raghavendra and Tan in Section 1.2, and highlight the key reasons why it breaks down when k=o(n).

We next address the more general setting of multiple constraints, focusing on the case c=O(1).

Theorem 5.

For every ε>0, there is an algorithm that runs in O(min{k/ε,n}poly(c/ε)+poly(n)) time, where k=i[c]ki, and obtains an (αccε)-approximation to the optimal solution of Constrained Max-Cut. In particular, when c=O(1) and ε is fixed, the running time is polynomial in n.

More broadly, we study Max-Cut under an arbitrary matroid constraint =(V,), generalizing ConstrainedMaxCut with an arbitrary number of partition constraints, especially when c=ω(1).

Theorem 6.

There exists a 0.5-approximation algorithm for matroid Max-Cut.

The only prior result for this setting is by Lee et al. [21], who provided a (13ε)-approximation for the more general problem of symmetric submodular function maximization subject to a matroid base constraint.

Finally, we show that for general ConstrainedMaxCut with an arbitrary number of constraints, it is NP-hard to decide whether there exists a feasible solution cutting all edges. Formally:

Theorem 7.

Given a graph G=(V,E), a partition of vertices into V1,,Vc, and budget parameters k1,,kc, it is NP-hard to decide whether there exists a feasible solution S such that δ(S)=|E|.

We note that for the standard Max-Cut problem, this decision variant can be solved in polynomial time using bipartite testing.

Our Techniques.

A key technical contribution of our work is the construction of an approximate kernel for ConstrainedMaxCut. Specifically, for any cardinality constraint k, we sort the vertices by their (weighted) degrees as v1,,vn, and define V~ as the top O(k/ε) vertices. While the graph G remains unchanged, we restrict our attention to solutions contained entirely within V~. Then, an optimal solution S~ to MaxCutk over V~ achieves a cut value that is at least a (1ε) fraction of the true optimum. In other words,

maxS~V~,|S~|=kδw(S~)(1ε)maxSV,|S|=kδw(S). (1)

See Theorem 12 in Section 2 for the formal statement.

This reduction is particularly useful because it allows us to focus on problem instances where k=Ω(n). Conceptually, we can contract the vertices in VV~ into a single super vertex s, and then restrict the solution to exclude s. This transforms the sparse regime into one where the effective solution size is a constant fraction of the (reduced) vertex set, enabling the use of correlation rounding techniques that require k=Ω(n).

In contrast, prior work by [24] uses kernelization to design fixed-parameter algorithms for MaxCutk, but their parameter is the value of the optimal solution itself, and they aim for an exact kernel. As a result, their kernel size is polynomial in k, which is insufficient for our purposes. Moreover, their kernelization is sequential and adaptive, while ours is non-adaptive. Our approximate kernel also extends to ConstrainedMaxCut with multiple constraints (c>1), as formalized in Theorem 14.

Once we reduce to an instance with k=Ω(n), we apply the Raghavendra–Tan algorithm [22] to obtain a subset of vertices of size kk[1ε,1+ε], achieving a cut value that is at least an αcc-fraction of the optimum. We then perform a random correction step: adjusting the solution by randomly adding or removing at most εk vertices to exactly match the required size k, incurring only a negligible loss in cut value.

When c>1, however, the rounding procedure of [22] does not directly apply. To handle the setting with multiple constraints, we introduce the notion of α-block independence for SDP solutions, which generalizes the standard notion of α-independence. Informally, an SDP solution is α-block independent if, within each partition Vi, the average correlation between pairs of vertices is at most α.

We first show how to efficiently construct a block-independent solution. Then, by applying the rounding algorithm of [22], we obtain a subset S that approximately satisfies each group constraint: for each i[c], the size ki=|SVi| lies in the range [(1ε)ki,(1+ε)ki]. Finally, we apply a random correction step within each group to enforce exact feasibility, while ensuring that the cut value degrades by only a negligible amount.

For matroid Max-Cut, we combine techniques from [1] and [7] to design a linear programming relaxation with integrality gap at most 0.5, which can be solved efficiently. Applying pipage rounding to this relaxation yields a deterministic 0.5-approximation algorithm.

1.2 Preliminaries

Our results heavily rely on the global correlation rounding technique developed in [22]. For completeness, we include the relevant definitions and theorems in this section. A quick summary of the Lasserre hierarchy is provided in Appendix A.

Naive approaches based on variants of hyperplane rounding applied to a two-round Lasserre SDP relaxation for the MaxCutk problem can produce subsets S of expected size k that achieve good approximation guarantees. However, these approaches offer no control over the concentration of |S| around k, due to potentially high correlations between the values assigned to vertices by the SDP solution.

Notation.

We use μ={μS}|S| to denote a level- Lasserre pseudo-distribution, where μS:{1,1}S[0,1] is a distribution over partial assignments to the subset SV. Let XS denote the random variables jointly distributed according to μS, and Xi the marginal variable for iV under μ{i}. We write μS[XSA] to denote the pseudo-probability that the assignment to S lies in the event A{1,1}S. In particular, conditional pseudo-probabilities are expressed as μS{i}[Xi=1XS=α], which denotes the pseudo-probability that Xi=1 given that XS=α for some α{1,1}S.

SDP Relaxation.

To leverage the correlations between vertices, [22] employ an -round Lasserre SDP for MaxCutk with a sufficiently large constant , formally described in Equation 2.

max {i,j}Ewi,jμ{i,j}[X{i,j}{(1,1),(1,1)}] (2)
s.t. iVμS{i}(Xi=1XS=α)=k SV,|S|1,α{0,1}S
μ is a level- pseudo-distribution.
Measuring Correlations.

One method to assess the correlation between two random variables Xi and Xj is through mutual information, defined as Iμ{i,j}(Xi;Xj)=H(Xi)H(XiXj), where Xi and Xj are sampled according to the local distribution μ{i,j}. An SDP solution is α-independent if the average mutual information between uniformly random vertex pairs is at most α, i.e., 𝔼i,jV[I(Xi;Xj)]α.

Definition 8 (α-independence [22]).

An SDP solution to an -round Lasserre SDP is α-independent if 𝔼i,jV[Iμ{i,j}(Xi;Xj)]α, where μ{i,j} is the local distribution over {i,j}. More generally, if W is a distribution over V, then the solution is α-independent w.r.t. W if 𝔼i,jW[Iμ{i,j}(Xi;Xj)]α. When unspecified, W is assumed to be the uniform distribution over V.

For many standard rounding schemes, such as halfspace rounding, the variance in the balance of the resulting cut is directly linked to the average correlation among random vertex pairs. Specifically, if the rounding scheme is applied to an α-independent solution, the variance in the cut’s balance is bounded by a polynomial function of α.

Obtaining Uncorrelated SDP Solutions.

If all vertices in a t-round Lasserre SDP solution are highly correlated, conditioning on the value of one vertex reduces the entropy of the rest. Formally, if the solution is not α-independent (i.e., 𝔼i,jV[I(Xi;Xj)]>α), then conditioning on a randomly chosen vertex i and its value b decreases the average entropy of the remaining variables by at least α. Repeating this process 1/α times suffices to obtain an α-independent solution. Thus, starting from a t-round Lasserre SDP solution, this process results in a (t)-round α-independent solution for some =O(1/α).

Rounding Uncorrelated SDP Solutions.

Given an α-independent SDP solution, many natural rounding schemes ensure that the balance of the output cut is concentrated around its expectation. Hence, it suffices to construct rounding schemes that preserve the expected balance. Raghavendra and Tan [22] present a simple rounding procedure that preserves the individual bias of each vertex, thereby ensuring the global balance property.

An elegant probabilistic argument from [22] shows how to convert an (+4/α2+1)-round Lasserre SDP solution into an α-independent -round solution, while losing only an additive factor of α in the objective value (assuming the optimum is normalized to at most 1).

Lemma 9 ([22]).

​​There exists tk such that 𝔼i1,,itW𝔼i,jW[I(Xi;XjXi1,,Xit)]1k1.

Lemma 9 implies the existence of a t1/α+1 such that conditioning on the joint assignment to t randomly sampled vertices reduces the average mutual information between other pairs to at most α.

Theorem 10 ([22]).

For every α>0 and integer , there exists an algorithm running in time O(npoly(1/α)+) that finds an α-independent solution to the -round Lasserre SDP, with objective value at least OPTα, where OPT is the optimum SDP value.

Theorem 10 implies that there exists t=O(1/α2) such that conditioning on t vertices yields an α-independent solution with probability at least α/2. Since the sampling procedure preserves the marginal biases of the vertices, the SDP objective remains close to optimal in expectation. By Markov’s inequality, the value of the conditioned solution is at least OPTα with probability at least 1/(1+α). Thus, there exists a small subset of vertices such that conditioning on them yields an α-independent solution with near-optimal value.

Algorithm 5.3 of [22] is a rounding scheme that preserves the bias (according to the SDP solution) of every vertex while also approximately preserving the pairwise correlations up to polynomial factors. Using numerical techniques, they show that the probability of an edge being cut is at least αcc0.858 times its contribution to the SDP objective, implying that the total cut value is at least αcc times the SDP value.

Controlling Cut Balance.
Theorem 11 ([22]).

Given an α-independent solution to two rounds of the Lasserre SDP, let {yi}iV denote the rounded output from Algorithm 5.3. Let S=𝔼iW[yi] be the expected balance of the cut. Then, Var(S)O(α1/12).

By applying Chebyshev’s inequality to Theorem 11, the number of vertices in the cut lies in the range k±nO(α1/24) with high probability. When k/n=Ω(1), we can choose α=Ω(1) small enough so that the relative deviation is within εk. A post-processing step can then adjust the set size to exactly k (e.g., by swapping a small number of vertices), which incurs only an O(ε) fractional loss in cut value. However, when k=o(n), the additive error term nO(α1/24) may significantly exceed k, making it difficult to ensure cardinality feasibility without substantially affecting the objective.

Notation.

For any subset SV and vertex iV, we write S+i:=S{i} and Si:=S{i}. Let G=(V,E) be a graph with non-negative edge weights w:E0. For a parameter r, let HrV denote the set of the r highest-degree vertices in G under weight function w. If the vertex set V is partitioned into c disjoint groups, V=i[c]Vi, then Hr(i)Vi denotes the r highest-degree vertices in part Vi. When the weight function is clear from context, we abbreviate the weighted degree of a vertex v as δ(v) instead of δw(v).

2 Approximate Kernels for Max-Cut with Cardinality Constraints

In Section 2.1, we show that for any instance (G,k) of MaxCutk, one can reduce the graph to a (conditioned) instance (G~,k) with |V~|=O(k/ε) vertices. In Section 2.2, we generalize this construction to the setting with multiple partitions. Specifically, for any instance (G=(V=i[c]Vi,E),(k1,,kc)) of ConstrainedMaxCut, we construct a conditioned instance (G~=(V~=i[c]V~i,E~),(k1,,kc)) such that for every i[c], we have |V~i|=O(ki/ε).

We use OPT to denote the optimal cut value of a given instance. For the single-group problem MaxCutk on a graph G=(V,E), we define OPT:=maxSV,|S|=kδ(S). For the multi-group case ConstrainedMaxCut with partition V=i[c]Vi and size constraints k1,,kc, the optimal value is OPT:=maxS=i[c]SiSiVi,|Si|=kiδ(S).

2.1 Approximate Kernel for 𝐌𝐚𝐱𝐂𝐮𝐭𝒌

Kernel Procedure for 𝐌𝐚𝐱𝐂𝐮𝐭𝒌.

We now describe the approximate kernel construction for the single-group MaxCutk problem.

Input: Graph G=(V,E), cardinality parameter k, and approximation parameter 0<ε1/2.
Output: Reduced graph G~=(V~,E~).

  1. 1.

    If k/εn, return G.

  2. 2.

    Otherwise, sort the vertices of G in decreasing order of weighted degree, and retain only the top k/ε vertices. Merge the rest of vertices into a super vertex s, and return the resulting graph G~.

Note that the super vertex s appears in the output graph G~ only if k/ε+1n.

Theorem 12.

For any MaxCutk instance (G,k), let (G~,k) be the reduced instance returned by the MaxCutk kernel procedure above. Then the optimal cut value of MaxCutk on G~, conditioned on not selecting the super vertex s, satisfies

OPT~:=maxSV~{s},|S|=kδ(S)(14ε)OPT.
Proof.

Let h:=k/ε, and recall from the notation in the preliminaries that Hh is the set of the h highest-degree vertices in G. Let S be an optimal solution of size k in G with cut value δ(S)=OPT. We will construct a set STHh of size k with value close to OPT.

We iteratively transform S into a set within Hh by applying Lemma 13 up to k times. At each step t, we replace a vertex jtSt1Hh with a vertex itHhSt1 such that:

δ(St)(12hk)δ(St1).

Since each step increases |StHh| by one, the process terminates in at most Tk steps. Therefore,

δ(ST)OPT(12hk)TOPT(12Thk)OPT(12khk).

Substituting h=k/ε, we get

δ(ST)OPT(12ε1ε)OPT(14ε),

where the final bound assumes ε1/2.

Lemma 13.

Let SV be a subset of size |S|=kh such that SHh. Then there exist vertices iHhS and jSHh such that:

δ((Sj)+i)(12hk)δ(S).
Proof.

Since SHh and |S|h, we have HhS. Let iHhS be the vertex minimizing δ(S,{i}), the total weight of edges between i and S. Let j be any vertex in SHh.

We use the submodularity of the cut function:

δ((Sj)+i)+δ(S)δ(S+i)+δ(Sj).

Rearranging:

δ((Sj)+i)δ(S) [δ(S+i)δ(S)]+[δ(Sj)δ(S)]
=(δ({i})2δ(S,{i}))(δ({j})2δ(Sj,{j}))
=(δ({i})δ({j}))+2δ(Sj,{j})2δ(S,{i})
2δ(S,{i}).

Now we bound δ(S,{i}). Since i minimizes δ(S,) among HhS, we have:

δ(S)=vVSδ(S,{v})vHhSδ(S,{v})|HhS|δ(S,{i}),

which implies:

δ(S,{i})δ(S)hk.

Putting everything together:

δ((Sj)+i)δ(S)2hkδ(S)=(12hk)δ(S),

completing the proof.

2.2 Approximate Kernel for Constrained Max-Cut

Kernel Procedure for Constrained Max-Cut.

We now describe the kernelization procedure for the ConstrainedMaxCut problem with multiple vertex groups.

Input: Graph G=(V=i[c]Vi,E), cardinality constraints k1,,kc, and approximation parameter 0<ε1/2.
Output: Reduced graph G~=(V~=i[c]V~i,E~).

  1. 1.

    For each i[c], if ki/ε+1ni:=|Vi|, retain the top ki/ε vertices in Vi by weighted degree and merge the remaining vertices into a super vertex si.

  2. 2.

    Return the resulting graph G~.

Note that a super vertex si appears in the output graph G~ only if ki/ε+1ni. Let Ssuper:={sisi exists in G~} denote the set of all super vertices.

Theorem 14.

For any ConstrainedMaxCut instance (G,k1,,kc), let (G~,k1,,kc) be the reduced instance returned by the ConstrainedMaxCut kernel procedure. Then the optimal value of the reduced instance, conditioned on not selecting any super vertex, satisfies

OPT~:=maxSV~Ssuper|SV~i|=kii[c]δ(S)(14cε)OPT.
Proof.

Let S be an optimal solution to the original instance with δ(S)=OPT. For each part i[c], define Hi:=Hki/ε(i) as the top ki/ε vertices in Vi by weighted degree (as defined in the preliminaries).

We will transform S into a solution ST such that STViHi for every i[c] while losing only a small fraction of the cut value. At each step t, identify the smallest index p[c] for which StVpHp, and apply the local exchange from Corollary 16 to swap a vertex j(StVp)Hp with a vertex iHp(StVp), yielding a new set St+1 with

δ(St+1)(12ki/εki)δ(St).

For each i[c], we perform at most ki such exchanges in Vi. Hence, the total cut value at the end satisfies:

δ(ST)δ(S)i=1c(12ki/εki)ki.

Using the inequality (1x)m1mx, we get

δ(ST)OPT(1i=1c2kiki/εki)OPT(1i=1c2ε1ε)OPT(14cε),

where the final bound assumes ε1/2.

Lemma 15.

Let SV be a subset of size k and let HV be a subset of size greater than k such that SH and every vertex in HS has higher weighted degree than every vertex in SH. Then there exist iHS and jSH such that:

δ((Sj)+i)(12|H\S|)δ(S).
Proof.

The proof is identical to that of Lemma 13. It proceeds by selecting i to minimize δ(S,{i}) over HS and applying cut submodularity to bound the loss when replacing jSH.

Corollary 16.

For any subset SV and an index p[c] such that |SVp|=kph and (SVp)\Hh(p), there exist vertices iHh(p)\(SVp) and j(SVp)\Hh(p) such that

δ((Sj)+i)(12hkp)δ(S).
Proof.

Using H=(S\Vp)Hh(p) in Lemma 15, and the fact that H\S=Hh(p)\(SVp) finishes the proof.

3 Single Constraint

In this section, we describe our (αccε)-approximation algorithm for MaxCutk, for all values of k. Without loss of generality, we assume kn/2 due to the symmetry of the cut function.

3.1 Algorithm

Input: Weighted graph G=(V,E) and parameters kn/2, 0<ε1/2.
Output: A set SV of size |S|=k.

  1. 1.

    (Preprocessing Step) Let G~=(V~,E~) be the approximate kernel output by the MaxCutk kernel with input (G,k,ε). Note that |V~|=O(k/ε).

  2. 2.

    (SDP and Conditioning)

    1. (a)

      Solve a (3+4/ε120)-round Lasserre SDP relaxation for the MaxCutk problem on the graph G~ (see Section 3.2).

    2. (b)

      Apply Theorem 10 with α=ε60 and =2 to obtain a 2-level SDP solution that is ε60-independent and has objective value at least OPT~ε60, where OPT~ is the optimum value of MaxCutk on G~ (conditioned on not selecting the super vertex s). By Lemma 18 (2), we know that OPT~ε60(1ε)OPT~.

  3. 3.

    (Rounding) Apply the rounding algorithm of Raghavendra and Tan (Algorithm 5.3 in [22]) to obtain a (random) set S^. Let denote the event that |S^|[kε2|V~|,k+ε2|V~|].

  4. 4.

    (Correction) If event does not occur, return an arbitrary subset SV~{s} of size k. Otherwise, adjust S^ by randomly adding or removing vertices to produce a set S of size exactly k, and return S.

3.2 Lasserre SDP

We now describe the SDP used in Step 2 of the algorithm above. Since a full overview of the Lasserre hierarchy is already provided in Appendix A, we only describe the relevant formulation.

After the preprocessing step, we solve the following level- Lasserre SDP relaxation for MaxCutk on the reduced graph G~, with an additional constraint ensuring the super vertex s is not selected:
max {i,j}Ewi,jμ{i,j}[X{i,j}{(1,1),(1,1)}] (3) s.t. iV~μS{i}(Xi=1XS=α)=k SV,|S|1,α{0,1}S μS{s}(Xs=1XS=α)=0 |S|1,α{0,1}S (4) μ is a level- pseudo-distribution.

3.3 Analysis

Proof of Theorem 4.

Using Lemma 18 (1), the optimal value on the kernelized instance G~ is at least (14ε)OPT. After solving the SDP, we obtain a value at least (1ε)OPT~(15ε)OPT.

The expected size of S^ is exactly k since Algorithm 5.3 from [22] preserves the bias of each vertex. Using Theorem 11, the variance of the balance |S^|/|V~| is at most O(ε60/12)=O(ε5) (Assume that the constant hidden in the O-notation is 1 for simplicity. One can absorb the constant into the ε in general). By Chebyshev’s inequality, the event occurs with probability at least 1ε.

The expected cut value conditioned on satisfies:

𝔼[δ(S^)]𝔼[δ(S^)]εOPT1ε((15ε)αccε1ε)OPT(αcc9ε)OPT. (5)

Let S be the final corrected set. Then:

𝔼[δ(S)] (1ε)𝔼[δ(S)] (6)
(Lemma 17)(1ε)2𝔼[δ(S^)] (7)
(5)(1ε)2(αcc9ε)OPT. (8)

Steps 1 and 4 take O(|E|log|E|) and O(|V|) time, respectively. Steps 2 and 3, which involve solving the Lasserre SDP and rounding, dominate the runtime and require O(|V~|)poly(1/ε) time.

Lemma 17.

Conditioned on the event and a fixed S^, let S be the set obtained by randomly adding or deleting |k|S^|| vertices from V~(S^{s}) so that |S|=k. Then:

𝔼[δ(S)](1ε)δ(S^).
Lemma 18.

Let OPT~ denote the optimum value of MaxCutk on G~, conditioned on not selecting the super vertex s.

  1. 1.

    OPT~(14ε)OPT, where OPT is the optimum value for MaxCutk on G.

  2. 2.

    OPT~ is at least an ε-fraction of the total edge weight in E~.

Proof.

Part (1) follows from Theorem 12. For part (2), we show that a uniformly random subset of V{s} of size k cuts any edge with probability at least ε.

Let n:=|V~{s}|. Then 2knk/ε. If edge e is adjacent to s, it is cut with probability k/nε. Otherwise, the cut probability is 2k(nk)/(n(n1))k/nε.

4 Constant Number of Constraints

In this section, we present our (αccε)-approximation algorithm for ConstrainedMaxCut, the Max-Cut problem with c cardinality constraints. Our primary focus is on instances where the number of vertices to be selected from each part Vi is relatively small, and for this reason, we assume that kini/2. (Unlike the case in MaxCutk, this is not without loss of generality.)

The key observation enabling this extension of [22] to multiple constraints is that the notion of α-independence can be defined locally within each block. Specifically, it suffices to ensure that the average mutual information between vertex pairs within each part is small:

𝔼i,iVj[I(Xi;Xi)]αfor all j[c].

If this condition holds, then after rounding via Algorithm 5.3 of [22], the size of each intersection |S^Vj| concentrates around its expectation. In particular, using Theorem 11 with W as the uniform distribution over Vj, the variance of |S^Vj|/|Vj| is bounded by O(α1/12) for every j[c]. Therefore, for an appropriate choice of α, we obtain

|S^Vj|[kj(1ε),kj(1+ε)]simultaneously for all j[c],

with probability at least 1ε.

Definition 19 (α-block independence).

An SDP solution to an -round Lasserre relaxation is α-block independent if 𝔼i,iVj[Iμ{i,i}(Xi;Xi)]α hold for all j[c].

To find such solutions, we extend the conditioning technique of [22]. The following procedure begins with an (L+)-round SDP solution and returns an -round α-block independent solution for L=O(c2/α2).

Conditioning Procedure

  1. 1.

    For each t[L]:

    1. (a)

      Sample a block index jt[c] uniformly at random.

    2. (b)

      Sample a vertex itVjt uniformly at random.

    3. (c)

      Sample Xit from its marginal distribution under the current SDP solution (conditioned on previous outcomes), and condition on this value.

    4. (d)

      If the resulting SDP solution is α-block independent, terminate and return it.

Lemma 20.

For any L2, there exists tL such that

𝔼j1,,jt[c]𝔼i1Vj1,,itVjt[j,j[c]𝔼iVj,iVjI(Xi;XiXi1,,Xit)]c2L.
Proof.

Define the potential function:

ϕt:=𝔼j1,,jt[c]𝔼i1Vj1,,itVjt[𝔼j[c]𝔼iVjH(XiXi1,,Xit)].

Now, conditioned on fixed values of j1,,jt and i1,,it, the difference in potentials is:
ϕtϕt+1 =𝔼j[c]𝔼iVj(H(XiXi1,,Xit)𝔼jt+1[c]𝔼it+1Vjt+1H(XiXi1,,Xit+1)) =𝔼j,jt+1[c]𝔼iVj,it+1Vjt+1I(Xi;Xit+1Xi1,,Xit) =1c2j,j[c]𝔼iVj,iVjI(Xi;XiXi1,,Xit).

Taking expectation over all random choices of j1,,jt and i1,,it gives:

ϕtϕt+1=1c2𝔼j1,,jt𝔼i1,,it[j,j[c]𝔼iVj,iVjI(Xi;XiXi1,,Xit)]. (9)

Summing (9) over t=0 to L1, and noting that entropy is always non-negative, we get:

t=0L1𝔼j1,,jt𝔼i1,,it[j,j[c]𝔼iVj,iVjI(Xi;XiXi1,,Xit)]c2(ϕ0ϕL)c2.

Therefore, by averaging, there exists tL for which the expected blockwise mutual information is at most c2/L.

Corollary 21.

If

j,j[c]𝔼iVj,iVjI(Xi;XiXi1,,Xit)α,

then

𝔼i,iVjI(Xi;XiXi1,,Xit)αfor all j[c].
Proof.

Each blockwise term is a subset of the global sum, and mutual information is non-negative.

Theorem 22.

For every α>0 and integer >0, there exists an algorithm running in time O(n+poly(c/α)) that finds an α-block independent solution to the -round Lasserre SDP with value at least OPTα, where OPT is the optimum value of the (L+)-round SDP.

Proof.

Set L=4c2α2. First, solve the (L+)-round Lasserre SDP relaxation (as described in Section 4.1) to obtain an initial solution.

Next, apply the conditioning procedure described above. That is, for each t[L], sample a block index jt[c] uniformly at random, then sample a vertex itVjt uniformly, sample Xit from its marginal distribution (after the first t1 fixings), and condition the SDP solution on that assignment. Continue this process until the resulting pseudo-distribution becomes α-block independent.

We analyze this procedure by appealing to Lemma 20, which shows that:

𝔼j1,,jt[c]𝔼i1Vj1,,itVjt[j,j[c]𝔼iVj,iVjI(Xi;XiXi1,,Xit)]c2L=α24.

For some tL. By Markov’s inequality, the probability that the total conditional mutual information (summed over all block pairs) exceeds α is at most:

Pr[j,j𝔼iVj,iVjI(Xi;XiXi1,,Xit)>α]α2/4α=α4.

Thus, with probability at least 1α4, the conditioned solution is α-block independent. By Corollary 21, this also implies that:

𝔼i,iVjI(Xi;XiXi1,,Xit)αfor all j[c],

so the solution satisfies the desired independence property within each block.

Now consider the effect of the conditioning procedure on the SDP objective value. Let SDP denote the value of the SDP after conditioning. Since conditioning preserves expectations, we have 𝔼[SDP]=OPT. To bound the probability that the value drops by more than α, we apply Markov’s inequality to the non-negative random variable 1SDP:

Pr[SDP<OPTα]=Pr[1SDP>1OPT+α]1OPT1OPT+α11+α,

where the last inequality uses OPT1.

Separately, as shown earlier, the probability that the conditioned solution fails to be α-block independent is at most α/4. By a union bound, the total failure probability is

α4+11+α<1,

for all α1. Hence, there exists a choice of conditioning – i.e., some tL and assignment to Xi1,,Xit – such that the resulting SDP solution is α-block independent and has objective value at least OPTα.

This outcome can be found by brute-force search over all subsets of up to L=O(c2/α2) variables and their possible assignments. The overall runtime is thus

O(n+L)=O(n+poly(c/α)),

as claimed.

Proof of Theorem 5.

Lemma 23 proves this theorem for Algorithm 4.1.

4.1 Algorithm

Input: Weighted graph G=(V,E) with vertex set partitioned as V=i=1cVi, parameters ki|Vi|/2 for i[c], and 0<ε1/2.
Output: A set SV such that |SVi|=ki for all i[c].

  1. 1.

    (Preprocessing Step) Let G~=(V~,E~) be the approximate kernel obtained via the ConstrainedMaxCut kernel procedure with input (G,(k1,,kc),ε).

  2. 2.

    (SDP and Conditioning)

    1. (a)

      Solve a (3+4c/ε120)-round Lasserre SDP relaxation for ConstrainedMaxCut on G~ (see Section 4.2).

    2. (b)

      Apply Theorem 22 with α=ε60 and =2 to obtain a 2-level SDP solution that is ε60-block independent and has value at least OPT~ε60. From Lemma 24, we know OPT~ε60(1ε)OPT~.

  3. 3.

    (Rounding) Apply Algorithm 5.3 from [22] to obtain a (random) set S^. Let i denote the event that |S^Vi|[kiε2|Vi|,ki+ε2|Vi|] for each i[c], and define :=i=1ci.

  4. 4.

    (Correction) If does not occur, return an arbitrary feasible set. Otherwise, for each part Vi, randomly add or remove vertices to ensure |S^Vi|=ki, and return the resulting set S.

Lemma 23.

The expected value of the cut returned by Algorithm 4.1 is at least (αccO(ε))OPT. The running time of the algorithm is O(min{k/ε,n}poly(c/ε)+poly(n)) where k=i=1cki.

4.2 SDP Relaxation

We solve the following level- Lasserre SDP relaxation for ConstrainedMaxCut on the reduced graph G~:

max {i,j}Ewi,jμ{i,j}[X{i,j}{(1,1),(1,1)}] (10)
s.t. iV~jμS{i}(Xi=1XS=α)=kj j[c],|S|1,α{0,1}S
μS{sj}(Xsj=1XS=α)=0 j[c],|S|1,α{0,1}S
μ is a level- pseudo-distribution.

4.3 Analysis

Proof of Lemma 23.

By Lemma 24, we have OPT~(14cε)OPT, and after solving the SDP and conditioning, the objective remains at least (1ε)OPT~(1(4c+1)ε)OPT.

Since the SDP solution is ε60-block independent, using Theorem 11 with W as the uniform distribution over each Vi, the variance of |S^Vi|/|Vi| is O(ε5). By Chebyshev’s inequality, the event i occurs with probability at least 1ε, so the joint event =i=1ci occurs with probability at least 1cε.

The expected value of the cut after rounding is at least αcc(1(4c+1)ε)OPT. Conditioning on , we have:

𝔼[δ(S^)] 𝔼[δ(S^)]cεOPT1cε
(αcc(1(4c+1)ε)cε1cε)OPT(αccO(cε))OPT. (11)

Let S be the corrected set after adjusting S^ to satisfy cardinality constraints exactly. Using the same argument as in Lemma 17 applied sequentially across the c parts, we get:

𝔼[δ(S)](1cε)(1ε)c𝔼[δ(S^)](αccO(cε))OPT. (12)

The total running time is dominated by solving the SDP and brute-force conditioning, which takes O(n+poly(c/ε)). Preprocessing and postprocessing steps take poly(n) time.

Lemma 24.

Let OPT~ be the optimum value of ConstrainedMaxCut on G~ (conditioned on not picking any si).

  1. 1.

    OPT~(14cε)OPT, where OPT is the optimum value of the ConstrainedMaxCut instance on G.

  2. 2.

    OPT~ε fraction of the total edge weight in E~.

Proof.

Part (1) follows from Theorem 14. For (2), consider sampling S=i=1cSi, where each Si is a uniformly random subset of size ki from V~i{si}. Let ni=|V~i{si}|. Since 2kiniki/ε, we have ki/niε.

  1. 1.

    If an edge is adjacent to si, the cut probability is at least ε.

  2. 2.

    If both endpoints are in the same Vi, Lemma 18 (2) gives cut probability ε.

  3. 3.

    If endpoints lie in Vi and Vj (ij), the probability that exactly one endpoint lies in S is

    (kini)(1kjnj)+(kjnj)(1kini)ε/2+ε/2=ε.

So the expected cut value of S is at least an ε fraction of total edge weight in E~.

Proof of Theorem 5.

Lemma 23 proves this theorem for Algorithm 4.1.

5 Arbitrary Number of Constraints

We consider the general case of ConstrainedMaxCut with an arbitrary number of constraints, potentially c=ω(1). First, we present a 0.5-approximation for the more general problem of Max-Cut under an arbitrary matroid constraint. Next, we establish an NP-hardness result for determining whether the optimal solution in a given instance of ConstrainedMaxCut with an arbitrary number of constraints equals the total number of edges in the graph.

5.1 Approximation Algorithm

Proof of Theorem 6.

Consider the following linear program:

max: eEweye (13)
y{u,v} xu+xv (14)
y{u,v} 2(xu+xv) (15)
x (16)

where is the base polytope of the matroid . We can see that for any given x, the optimal choice for y{u,v} is min{xu+xv,2xuxv}. When x is integral, this function also coincides with the indicator whether edge {u,v} has been cut. Since the matroid polytope is separable, we can solve the LP Equation 13 efficiently.

Now consider the following non-concave quadratic program:

max: {u,v}Ewe(xu+xv2xuxv) (17)
x (18)

Observe that the function xu+xv2xuxv also coincides with the cut indicator function of edge {u,v} when x is integral. Even though we cannot solve Equation 17 efficiently, we can show that it has no integrality gap and infact that we can round any fractional solution x^ to a solution x that is integral and with value at least that of x^. The two crucial properties we need that are easy to see are:

  1. 1.

    The function {u,v}Ewe(xu+xv2xuxv) is convex in any direction euev for uvV. Here eu{0,1}V is the indicator vector for vertex u.

  2. 2.

    The polytope is the facet of a matroid and hence solvable and integral.

Given these properties, any fraction solution can be pipage rounded (see [1] and especially section 3.2 of [7]) to an integral solution with value at least that of the fractional solution. The final observation is that for any x[0,1]V, we have

(xu+xv2xuxv)min{xu+xv,2xuxv}2(xu+xv2xuxv). (19)

from Lemma 27. This implies that the integrality gap of Equation 13 is at most 0.5 and in fact provides a way to find a rounding with value at least 0.5 times the LP value. Solve the LP and pipage round the solution using the quadratic objective. Note that the proof idea for Theorem 6 is essentially the same as in [1] used for the Hypergraph Max k-cut with given sizes of parts problem and the pipage rounding for matroids from [7].

5.2 Hardness Result

Proof of Theorem 7.

We show a reduction from the 3D matching problem. An instance of the 3D matching problem is a tripartite graph with parts X,Y,Z. The edges are triples (x,y,z)X×Y×Z. The problem is to decide if there is a subset of the edges such that every vertex is included in exactly one edge.

The reduction is as follows: For every edge e=(x,y,z), consider the star graph with four vertices with the center labeled e and the leaves labeled (e,x),(e,y),(e,z) respectively. The overall graph G is simply the union of all these stars. The partition matroid consists of parts 𝒫x for every vertex xX that contains the vertices (e,x) such that x=x. We have parts similarly for elements in Y and Z. The capacity of every part is exactly 1.

(Completeness) If there is a collection of edges ei,iM that every element of X,Y,Z is in exactly one edge, then consider the solution S=iM{(ei,ei.x),(ei,ei.y),ei.z}iM{ei} where ei.x:=eiX. It is easy to see that δ(S)=1 and |S𝒫u|=1 for every uXYZ.

(Soundness) Since every star graph is bipartite, any solution such that δ(S)=1 should have the center on one side and the leaves on the other side. This implies that every solution such that δ(S)=1 is of the form S=iM{(ei,ei.x),(ei,ei.y),ei.z}iM{ei} for some ME where E is the collection of triples from the 3D matching instance. The partition matroid constraint that |SPu|=1 for uXYZ exactly translates to M being a perfect 3D matching. Since it is NP-Hard to decide if there is a perfect 3D matching, it is NP-hard to decide if there is a cut SV such that δ(S)=1 and |S𝒫i|=ki,i[c] when c=ω(1).

 Remark 25.

The above theorem shows that, in general, deciding whether the Max-Cut value equals the total number of edges in the graph is solvable in polynomial time when the number of constraints is constant. Moreover, the decision problem becomes solvable in quasi-polynomial time when the number of constraints is poly(logn).

References

  • [1] Alexander A. Ageev and Maxim Sviridenko. Pipage rounding: A new method of constructing algorithms with proven performance guarantee. Journal of Combinatorial Optimization, 8:307–328, 2004. doi:10.1023/B:JOCO.0000038913.96607.C2.
  • [2] Per Austrin, Siavosh Benabbas, and Konstantinos Georgiou. Better balance by being biased: A 0.8776-approximation for max bisection. ACM Transactions on Algorithms (TALG), 13(1):1–27, 2016. doi:10.1145/2907052.
  • [3] Per Austrin and Aleksa Stanković. Global cardinality constraints make approximating some max-2-csps harder. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 145:24:1–24:17, 2019. doi:10.4230/LIPICS.APPROX-RANDOM.2019.24.
  • [4] Boaz Barak, Prasad Raghavendra, and David Steurer. Rounding semidefinite programming hierarchies via global correlation. In Annual Symposium on Foundations of Computer Science, pages 472–481, 2011. doi:10.1109/FOCS.2011.95.
  • [5] Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. Submodular maximization with cardinality constraints. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1433–1452. SIAM, 2014. doi:10.1137/1.9781611973402.106.
  • [6] Leizhen Cai. Parameterized complexity of cardinality constrained optimization problems. The Computer Journal, 51(1):102–121, 2008. doi:10.1093/COMJNL/BXM086.
  • [7] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766, 2011. doi:10.1137/080733991.
  • [8] Ozan Candogan, Kostas Bimpikis, and Asuman Ozdaglar. Optimal pricing in networks with externalities. Operations Research, 60(4):883–905, 2012. doi:10.1287/OPRE.1120.1066.
  • [9] Uriel Feige, Marek Karpinski, and Michael Langberg. A note on approximating max-bisection on regular graphs. Information Processing Letters, 79(4):181–188, 2001. doi:10.1016/S0020-0190(00)00189-7.
  • [10] Uriel Feige and Michael Langberg. Approximation algorithms for maximization problems arising in graph partitioning. Journal of Algorithms, 41(2):174–211, 2001. doi:10.1006/JAGM.2001.1183.
  • [11] Uriel Feige and Michael Langberg. The RPR2 rounding technique for semidefinite programs. Journal of Algorithms, 60(1):1–23, 2006. doi:10.1016/J.JALGOR.2004.11.003.
  • [12] Noah Fleming, Pravesh Kothari, and Toniann Pitassi. Semialgebraic proofs and efficient algorithm design. Foundations and Trends® in Theoretical Computer Science, 14(1-2):1–221, 2019. doi:10.1561/0400000086.
  • [13] Dimitris Fotakis and Paris Siminelakis. On the efficiency of influence-and-exploit strategies for revenue maximization under positive externalities. Theoretical Computer Science, 539:68–86, 2014. doi:10.1016/J.TCS.2014.04.026.
  • [14] Alan Frieze and Mark Jerrum. Improved approximation algorithms for MAX k-CUT and MAX BISECTION. Algorithmica, 18(1):67–81, 1997. doi:10.1007/BF02523688.
  • [15] Michel X Goemans and David P Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115–1145, 1995. doi:10.1145/227683.227684.
  • [16] Venkatesan Guruswami, Yury Makarychev, Prasad Raghavendra, David Steurer, and Yuan Zhou. Finding almost-perfect graph bisections. In In Proceedings of Innovations in Computer Science, pages 321–337, 2011. URL: http://conference.iiis.tsinghua.edu.cn/ICS2011/content/papers/11.html.
  • [17] Eran Halperin and Uri Zwick. A unified framework for obtaining improved approximation algorithms for maximum graph bisection problems. Random Structures and Algorithms, 20, May 2002. doi:10.1002/rsa.10035.
  • [18] Subhash Khot, Guy Kindler, Elchanan Mossel, and Ryan O’Donnell. Optimal inapproximability results for MAX-CUT and other 2-variable CSPs? SIAM Journal on Computing, 37(1):319–357, 2007. doi:10.1137/S0097539705447372.
  • [19] Charlotte Laclau, Christine Largeron, and Manvi Choudhary. A survey on fairness for machine learning on graphs. arXiv preprint arXiv:2205.05396, 2022. doi:10.48550/arXiv.2205.05396.
  • [20] Monique Laurent. A comparison of the sherali-adams, lovász-schrijver, and lasserre relaxations for 0-1 programming. Mathematics of Operations Research, 28(3):470–496, 2003. doi:10.1287/MOOR.28.3.470.16391.
  • [21] Jon Lee, Vahab S. Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Maximizing nonmonotone submodular functions under matroid or knapsack constraints. SIAM Journal on Discrete Mathematics, 23(4):2053–2078, 2010. doi:10.1137/090750020.
  • [22] Prasad Raghavendra and Ning Tan. Approximating CSPs with global cardinality constraints using SDP hierarchies. In Proceedings of the Symposium on Discrete Algorithms, pages 373–387. SIAM, 2012. doi:10.1137/1.9781611973099.33.
  • [23] Thomas Rothvoß. The lasserre hierarchy in approximation algorithms. Lecture Notes for the MAPSP, pages 1–25, 2013.
  • [24] Saket Saurabh and Meirav Zehavi. (k,nk)-max-cut: An 𝒪(2p)-time algorithm and a polynomial kernel. Algorithmica, 80:3844–3860, 2018.
  • [25] Yinyu Ye. A .699-approximation algorithm for max-bisection. Mathematical Programming, 90:101–111, 2001. doi:10.1007/PL00011415.

Appendix A Basics of the Lasserre SDP Hierarchy

For detailed expositions on the sum-of-squares hierarchy, we refer the reader to the excellent surveys by Laurent [20], Rothvoß [23], and Fleming et al. [12]. This section briefly summarizes key ideas drawn from these sources.

Given a binary optimization problem with a linear relaxation defined by a matrix 𝑨m×n and right-hand side 𝒃m, consider the feasible region K={xV:𝑨x𝒃}.

We ask: how can we systematically strengthen this relaxation to better approximate the convex hull of integral solutions, conv(K{1,1}V)?222We work over {1,1}V rather than {0,1}V since this is the natural domain for problems like Max-Cut, where signs encode partition membership.

Points in this convex hull can be interpreted as distributions over the hypercube {1,1}V. The level- Lasserre SDP yields a pseudo-distribution μ={μS}|S|, where each μS:{1,1}S[0,1] is a distribution over partial assignments to the subset SV. However, there need not exist a global distribution whose marginals agree with these μS. Despite this, the pseudo-distribution satisfies the following key properties:

  1. 1.

    Marginal Consistency: The pseudo-distributions are consistent across overlapping subsets. That is, for any subsets S,TV with |S|,|T| and any assignment a{1,1}ST, we have:

    μST(a)=b{1,1}STμS(ab)=c{1,1}TSμT(ac), (20)

    where ab denotes the extension of a to S using b, and similarly for ac on T.

  2. 2.

    Conditioning: The SDP solution supports conditioning on the value of any variable iV. Given a level- pseudo-distribution and variable i, there exist level-(1) pseudo-distributions μ(+), μ() and a weight λ[0,1] such that for all SV with |S|1 and α{1,1}S,

    μS(α)=λμS(+)(α)+(1λ)μS()(α), (21)

    where μS(+)(α) is nonzero only if α(i)=+1 and μS()(α) is nonzero only if α(i)=1.

While these properties are also satisfied by weaker hierarchies like Sherali–Adams, the Lasserre hierarchy is uniquely characterized by an additional sum-of-squares condition: for every polynomial q(x) of degree at most , the pseudo-expectation of its square is non-negative:

𝔼μ[q(x)2]0. (22)

Assuming polynomials are multilinear (since we evaluate over the hypercube), any such polynomial p(x) can be written as p(x)=ScSyS(x), where yS(x):=iSxi and |S|. Then the pseudo-expectation becomes:

𝔼μ[p(x)]=ScSα{1,1}SμS(α)yS(α).

Moreover, to incorporate the linear constraints 𝑨x𝒃, the Lasserre relaxation requires that:

𝔼μ[q(x)2(jVAi,jxjbi)]0i[m], for all q(x) of degree 1. (23)

A level- pseudo-distribution satisfying (22) and (23) can be found by solving a semidefinite program, as described below.

A.1 SDP Formulation

Definition 26 (Lasserre Hierarchy [23]).

Let K={xV:𝐀x𝐛}. The level-t Lasserre relaxation, denoted LASt(K), consists of vectors y2V satisfying:

Mt(y) =(yIJ)|I|,|J|t𝟎,
Mt(y) =(i=1nA,iyIJ{i}byIJ)|I|,|J|t1𝟎[m],
y =1.

Here, Mt(y) is the moment matrix, and Mt(y) is the moment matrix of slacks. The projection onto the original variables is denoted by LAStproj(K):={(y{1},,y{n}):yLASt(K)}.

This is a valid relaxation: for any integral solution xK{0,1}n, the assignment yS=iSxi for all S[n] yields a feasible point in LASt(K).

Each variable yS represents the pseudo-moment corresponding to all variables in S being assigned +1. From these, one can recover the pseudo-distribution via Möbius inversion:

μS(𝟙S,S)=SSTS(1)|TS|yTSS, (24)

where 𝟙S,S{1,1}S denotes the partial assignment that sets variables in S to 1 and the remaining in SS to +1.

Appendix B Omitted proofs

Lemma 27.

For any x,y[0,1], (x+y2xy)min{x+y,2xy}2(x+y2xy).

Proof.

We consider the following cases,

  1. 1.

    When x+y1, the required inequality to prove is

    x+y2xyx+y2(x+y2xy).

    The left most inequality is equivalent to 0xy which is trivially true. The right most inequality is equivalent to 4xyx+y which is true because 4xy(x+y)2x+y.

  2. 2.

    When x+y1, substituting x=1x,y=1y, the required inequality to prove is

    x+y2xyx+y2(x+y2xy).

    The conditions on x,y are that they are in the range [0,1] and that x+y1. This is exactly the case above.

Proof of Lemma 17.

Suppose |S^|<k, then S is obtained by adding a uniformly random set of k|S^| vertices from V~(S^{s}) to S^. The value |V~| is equal to k/ε+1 if k/ε+1n and n otherwise. In the first case, the probability of an element in V~S^ to be added to S^ is at most (k|S^|)/(k/ε|S^|)ε. In the second case, it is (k|S^|)/(n|S^|). For it to be at most ε, it suffices to have |S^|(kεn)/(1ε), which is true because in fact, we can show that

kεn1εkε2n|S^|.

The leftmost inequality is equivalent to kn(1ε+ε2) which is true because kn/2n(1ε+ε2). Using Lemma 28, we can imply that 𝔼[δ(S)](1ε)δ(S^).

If |S^|>k, then S is obtained by removing a uniformly random subset of size |S^|k. The probability of an element in S^ to be removed is (|S^|k)/|S^|ε. Hence, it suffices to have |S^|k1ε, which is true because in fact, |S^|k+ε2|V~|k/(1ε). The rightmost inequality is equivalent to |V~|k/(ε(1ε)). This is true because |V~|k/ε+1k/(ε(1ε)). Using Lemma 28 on V~S^, we get 𝔼[δ(S)](1ε)δ(S^).

Lemma 28 (Lemma 2.2 in [5]).

For any set SV, if RVS is a random set such that each element in VS is included in R (not necessarily independently) with probability at most p, then

𝔼[δ(SR)](1p)δ(S). (25)

This fact is generally true for any non-negative submodular function.