Abstract 1 Introduction 2 Preliminaries 3 Memory-query tradeoffs in the random query model References

Query Lower Bounds for Correlation Clustering Under Memory Constraints

Sumegha Garg ORCID Rutgers University, New Brunswick, NJ, USA Songhua He ORCID Rutgers University, New Brunswick, NJ, USA Periklis A. Papakonstantinou ORCID Rutgers University, New Brunswick, NJ, USA
Abstract

This work initiates the study of memory–query tradeoffs for graph problems, with a focus on correlation clustering. Correlation clustering asks for a partition of the vertices that minimizes disagreements: non-edges inside clusters plus edges across clusters. Our first result is a tight query lower bound: to output a partition whose cost approximates the optimum up to an additive error of εn2, any algorithm requires Ω(n/ε2) adjacency-matrix queries. Under memory constraints, we show that even for the seemingly easier task of approximating the optimal clustering cost (without producing a partition), any algorithm in the random query model must make n/ε2 adjacency-matrix queries. Finally, we prove the first general graph model query lower bound for correlation clustering, where algorithms are allowed adjacency-matrix, neighbor, and degree queries. The latter two bounds are not yet tight, leaving room for sharper results.

Keywords and phrases:
correlation clustering, query-space complexity, information theory
Funding:
Songhua He: The author was partially supported by Karthik C. S. through the National Science Foundation under Grant CCF-2443697.
Copyright and License:
[Uncaptioned image] © Sumegha Garg, Songhua He, and Periklis A. Papakonstantinou; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Lower bounds and information complexity
Acknowledgements:
The authors would like to thank anonymous reviewers for their valuable suggestions. Songhua He would also like to thank Karthik C. S. for generously providing partial research fellowship.
Editor:
Shubhangi Saraf

1 Introduction

How much information must an algorithm gather from a massive graph to solve a fundamental task like clustering, and how does this cost increase when the algorithm is constrained by limited memory? This work investigates these questions for the problem of Correlation Clustering (CC) – a cornerstone of machine learning and network analysis. We establish the precise information-theoretic requirements of solving CC in sublinear time (i.e., in terms of query complexity) and, importantly, initiate the study of query-space tradeoffs for graph problems.

Given an input graph for the correlation clustering problem, we interpret edges as pairs of “similar” (+) items and non-edges as pairs of “dissimilar” () items. The goal is to partition the vertices into clusters to minimize the total number of “disagreements”: the number of similar pairs that are cut apart plus the number of dissimilar pairs placed in the same cluster. This problem is closely related to other classic cut problems, such as Max-Cut and Minimum Bisection, and our first two hardness results also apply directly to these.

Correlation clustering has been extensively studied in the literature in various models and settings. The problem is known to be APX-hard [16]. After many years of effort, an exciting line of works has improved the approximation factor significantly from O(1) to 1.437 [6, 16, 1, 17, 22, 21, 23, 15, 14, 13]. For sublinear-time algorithms, the best-known approximation ratio is improved to the same as the best ratio 1.437 of polynomial-time algorithms [20, 5, 23, 13]. Efficient algorithms and lower bounds for this problem have also been studied in the Massively Parallel Computation (MPC) setting [7, 15, 23], dynamic setting [9], streaming model [10, 20, 5, 4, 8, 35, 12], and the query model [11] with unbounded time. In this work, we study feasibility of approximate correlation clustering (with additive error) in various query models and under memory constraints.

Our work makes three primary contributions towards understanding the query complexity of correlation clustering. Throughout, an additive error of εn2 means the algorithm’s solution cost may be at most εn2 worse than the true optimum. We remark that our main conceptual and technical contribution is the memory-query tradeoff for approximating the optimal clustering cost (in the random query model). However, to put things in proper context, we will present this as our second result in the introduction.

A tight query bound in the standard model

First, we consider the classic setting. An algorithm with unlimited memory that can adaptively query any pair of vertices to see if an edge exists. This is the adjacency-matrix query model. We show the following, asymptotically tight, query lower bound for finding a nearly optimal clustering.

Theorem 1 (Informal)111We do not include its formal statement and full proof here due to space constraints; a full version with complete details will be posted online.

Every randomized algorithm that finds a clustering within an additive error of O(εn2) must make Ω(n/ε2) adaptive adjacency-matrix queries.

This result improves a previous Ω(n/ε) lower bound given in [11] and perfectly matches the known upper bound in the same work [11], settling the query complexity of CC in the adjacency-matrix query model. Next, we show that for the seemingly easier problem of approximating the optimal clustering cost, the query complexity increases dramatically under memory constraints, albeit in the random query model.

A query-space tradeoff for approximating the optimal clustering cost

What happens when memory is scarce? To understand the additional query cost of limited memory algorithms, we turn to a model where the algorithm receives a stream of uniformly random vertex pairs and indicators of whether these pairs connect or not in the underlying graph. This is the random-query model. For computing boolean functions, time-space tradeoffs in the random query model have been introduced and studied in [36, 24]. In our second result, the task is to estimate the value of the optimal clustering, a seemingly easier goal than finding the partition.

Theorem 2 (Informal – restated as Theorem 1)

In the random-query model, an algorithm using only γn bits of memory to estimate the optimal CC cost, within additive error of εn2, needs Ω(min{q,n3/2γ}) queries where

q={nε2γif γ<1nε2γ2if γ1

To the best of our knowledge, this is the first query-space tradeoff for any approximation problem as well as a graph problem in the random query model. It demonstrates that memory is not free: with sub-polynomial space (when γ=o(1)), the query complexity blows past the O~(n/ε2) baseline, hitting a barrier of Ω~(n2) for even modest approximation goals. Our lower bound for the case γ1 completes the picture and shows that the query complexity remains high even when the memory is large. Memory and queries are provably and strictly intertwined.222Here is a word of caution on interpretation. Relative to Theorem 1, in Theorem 2 we change three parameters of the setting. (i) We impose a memory cap. (ii) We switch to the random–query model [36]. (iii) We study the value (cost) problem rather than the search problem. Our tradeoff should be read and contrasted with the other results in light of all three changes. The proof of the above theorem faces various technical challenges, which we elaborate on in Section 1.1.

Compared to Theorem 1, Theorem 2 weakens the setting in two ways: random queries instead of arbitrary queries, but turning to the cost variant of the problem. We now explain the reason for doing so. First, if instead of the random-query model we allowed arbitrary queries, then the same lower bound would hold for Turing machines using space clog2n, for a constant c>1. Finding a natural problem for which such a bound holds is conceptually difficult. Thus, restricting how we access the input circumvents this difficulty and makes the problem amenable to analysis. Second, we should explain why we use the cost/value formulation. There are two reasons. First, clarity. Second, which is crucial here, under O(n) space, the search formulation is vacuous. Merely writing down a near-optimal partition already needs Ω(n) bits (even for two clusters). Intuitively, any lower bound would then be dominated by output size rather than information. Focusing on value estimation removes this artifact and isolates the genuine memory–query tradeoff. This seems to be the “correct” problem variant to study here.

A lower bound in the general graph model

Finally, we consider a stronger query model, the general graph model. In this model, an algorithm can make pair, degree, and neighbor queries. This model was introduced and studied in [33, 2]. This model is significantly stronger, as neighbor queries can reveal global graph structure and create complex statistical dependencies that are challenging to analyze. Specifically, it is known from [33] that testing bipartiteness in the general graph model is strictly stronger than algorithms in the adjacency list or (exclusively) adjacency matrix models alone. However, proving lower bounds in the general model is conceptually difficult.

Theorem 3 (Informal)

Even in the general graph model, any algorithm that finds a clustering with an additive error of εn2 requires Ω(n/ε) queries.

While this bound is not as tight as Theorem 1, it establishes for the first time that CC remains information-theoretically hard even when algorithms are equipped with the strongest local query arsenal. In our proof, we deal with the challenges of this model by analyzing query interactions on carefully constructed regular graphs (graphs where symmetry in some sense turns neighbor queries useless).333The formal statement and full proof are omitted for brevity; a full version including details will be available separately.

Our results set forth the first query–space tradeoffs for graph problems. Beyond settling the adjacency–matrix complexity and showing hardness in the general model, Theorem 2 in some sense isolates the “price of memory”. Its proof provides a Boolean–Fourier and information-theoretic toolkit for random–query lower bounds that could potentially be of independent interest. We offer Correlation Clustering as the canonical testbed and this paper as the beginning of a broader research direction on query–space tradeoffs across graph primitives.

1.1 Our techniques

Due to space constraints, we only sketch the proof ideas of our main result here–the memory-query tradeoffs for correlation clustering (Theorem 1); the full version will include the proof techniques for the other two results.

To illustrate the proof techniques, we fix ε=n1/4 and consider (polylogn)-memory algorithms that approximate the clustering cost within an additive error of εn2. Given a graph G on n vertices, in the random query model, at each time step t, the algorithm receives a uniformly random vertex pair (u,v) along with the indicator bit specifying whether (u,v) is an edge of G. Using a VC-dimension argument [11], it follows that O(n/ε2)=O(n1.5) random queries suffice to produce a partition whose cost is within an additive error of εn2=n7/4 from optimal. Although representing such a partition requires Ω(n) bits of memory, it is natural to expect that just approximating the optimal clustering cost might be possible with significantly less memory444Indeed, low-space streaming algorithms are known for certain approximation parameters [10, 4].. Before our work, it was not known whether even O(logn)-memory algorithms could achieve an additive O(n7/4)-approximation using only O(n1.5) random queries555While [4] established polylogn-space hardness for approximating the clustering cost within an additive error of o(n2) in the worst-case streaming model, proving hardness in the random query model introduces new technical challenges, even relative to the random-order streaming setting.. We show that any polylogn-memory algorithm achieving this guarantee must in fact use Ω~(n7/4)n1.5 random queries.

The random-query model was introduced in [36] to study memory-query tradeoffs for computing n-bit Boolean functions, when the algorithm receives a random input bit at each time step. This and the subsequent work of [24] established non-trivial tradeoffs for functions with high sensitivity and total influence, respectively. However, for promise or approximation problems, no small set of edges has a strong influence, and therefore, these prior techniques do not apply to our setting. Instead, our result builds on the approach of [31], which established the hardness of approximating MAX-CUT (and the clustering cost) within a multiplicative factor of (2δ) over sparse graphs, in the one-pass streaming model. Specifically, the paper leveraged tight lower bounds for a two-player one-way communication problem – called the Distributional Boolean Hidden Partition (D-BHP) problem – to show that any (random-order) streaming algorithm that distinguishes between a bipartite graph and a random graph must use Ω(n) bits of memory.

There are two main challenges in extending this approach to prove an Ω~(n7/4) random-query lower bound for additive approximation of the optimal clustering cost (under memory constraints). First, [31] establishes lower bounds only for sparse graphs with O(n) edges; for such sparse graphs, an additive approximation of O(n1.5) to the clustering cost is trivial. Second, in the random-query model, after Ω(n) queries, edge repetitions occur with high probability, unlike in the random-order one-pass streaming model, where each edge is guaranteed to appear exactly once. The ability to go over the same edge twice significantly increases the technical difficulties in proving hardness results666Only recently, the breakthrough work of [25] showed that any constant-pass streaming algorithm for distinguishing between a sparse bipartite graph and a random graph requires Ω(n1/3) bits of space. Since multi-pass streaming algorithms read the stream in the same order, this model is incomparable to the random query model..

To overcome the first challenge, we introduce and study a noisy variant of D-BHP, which we call the Perturbed Distributional Boolean Hidden Partition (PD-BHP) problem. This formulation allows us to work with dense graphs that satisfy the bipartite partition “approximately”. Starting with the Boolean Hidden Matching (BHM) problem introduced by [26], BHM, D-BHP, and their variants have been extensively used to prove streaming and sketching lower bounds (see, e.g., [37, 30, 31, 34, 28, 32, 29, 27, 18, 3]). The tight bounds we establish for PD-BHP may therefore be of independent interest. Similar to [31], we use a hybrid argument to lift the two-player one-way communication lower bound to a multi-player one-way communication bound, where each player receives an independent graph obeying the hidden partition. In the random-query model, however, there is a single underlying graph G dictating the inputs to all players; in particular, repeated queries to the same edge must be consistent with previous observations. This brings in the second challenge. To overcome it, we show that any algorithm capable of distinguishing between the case where all edge queries are consistent and the case where each queried edge is independently resampled must use either polylogn bits of memory or Ω(n2/polylogn) random queries. We refer to this as the same vector problem, and we establish tight memory–query tradeoffs for it.

Tight communication complexity for PD-BHP problem

Consider the following two-player communication problem. Alice receives a uniformly random n-bit vector x{0,1}n. Bob receives a graph G=(V,E) on n vertices with a uniformly random set of r edges, and a vector w{0,1}r. In the NO case, each wi is an independent uniformly random bit. In the YES case, for the ith edge (ui,vi)E, we have

wi={xuixvi,with probability 1210ε,1(xuixvi),with probability 12+10ε.

In contrast, in the D-BHP problem studied by [31], the YES case is noiseless, that is wi=xuixvi. The goal is for Bob to distinguish between the two cases with as little communication from Alice as possible. We show that when Bob receives r=αn/ε2 edges (for some 0<α<1), and Alice sends at most γn bits (for some γ>0), Bob’s distinguishing advantage is at most O((γ+α)α1/2). This bound is tight: if Alice simply sends the first O(n/α) coordinates of x to Bob, then with high probability, the graph G will contain at least Ω(1/ε2) edges whose endpoints lie entirely within these coordinates, enabling Bob to distinguish the two cases with constant advantage. To lift the two-player communication bound to the multi-player setting, we parametrize 0<α=γ when γ<1, and α=O(1/γ2) when γ1. Our proof builds on the Fourier-analytic approach of [31] for the D-BHP problem. A key challenge in extending their proof is that, since Bob now receives n edges, the graph G necessarily contains many cycles (of length at least 3). In contrast, the argument of [31] relies crucially on the fact that when Bob receives n edges, the graph is cycle-free. By carefully balancing the number of cycles against the distinguishing advantage they provide, we are able to extend the Fourier-analytic proof to handle the noisy case on denser graphs.

Same vector problem

Let N=(n2). Consider the following distinguishing problem, where the goal is to decide whether the random queries are “consistent”. At each time step t, the algorithm receives a random index it[N] and a bit bt. In the NO case, bt is uniformly random, whereas in the YES case, bt=xit, where x{0,1}N is fixed at the start. We show that any algorithm that solves the distinguishing problem, in N<T<N time-steps, must use at least Ω(N/T) bits of memory.777Recall that by the Birthday Paradox, the probability of sampling the same index twice is negligible when TN. Our result covers the entire non-trivial range of T where collisions are possible but not guaranteed. This bound is tight (up to log factors): by storing the first s queries, an algorithm will, with high probability, encounter a repeated index within O(N/s) time steps, and thus distinguish with constant advantage. To obtain the tight result, we use an information-theoretic potential function that measures the maximum progress any s-memory algorithm can make towards distinguishing at a time-step. For our main theorem, we generalize the same vector problem to non-uniform bits.

2 Preliminaries

We always denote [k] as the set {1,,k}. For every binary string S, we use |S| to denote its Hamming weight. For every two binary strings S,S of the same length, we use dH(S,S) to denote their Hamming distance.

Basics of information theory

Given a random variable Z, we use H(Z) to denote the Shannon entropy of Z, i.e., H(Z)=zPr[Z=z]log(1/Pr[Z=z]). For a scalar p(0,1), H(p) denotes the entropy of the Bernoulli random variable with p probability to be 1. We use I(X;Y|Z) to denote the mutual information between X and Y conditioned on the random variable Z. I(X;Y|Z)=H(X|Z)H(X|Y,Z), where H(X|Y)=𝔼yH(X|Y=y)H(X). Next, we describe some of the properties of mutual information used in the paper.

  1. 1.

    (Chain Rule) I(X,Y;Z)=I(X;Z)+I(Y;Z|X).

  2. 2.

    For discrete random variables, X,Y,Z, I(X;Y|Z)=0XY|Z.

  3. 3.

    If I(Z1;Z2|X,Y)=0, then I(X;Z2|Y)I(X;Z2|Y,Z1).

  4. 4.

    If I(Z1;Z2|Y)=0, then I(X;Z2|Y)I(X;Z2|Y,Z1).

Property 1 follows from the chain rule for Shannon entropy. For property 2, it is easy to see that if XY|Z, then I(X;Y|Z)=0; the other direction uses strict concavity of the log function. Properties 3 and 4 follow from the observation that

I(X;Z2|Y)+I(Z1;Z2|X,Y)=I(X,Z1;Z2|Y)=I(Z1;Z2|Y)+I(X;Z2|Y,Z1).

As mutual information is non-negative, if I(Z1;Z2|X,Y)=0, then I(X;Z2|Y)I(X;Z2|Y,Z1) (because I(Z1;Z2|Y)0) and if I(Z1;Z2|Y)=0, then I(X;Z2|Y)I(X;Z2|Y,Z1).

Graph notations and problem definitions

Throughout this paper, we use G=(V,E) to denote the input graph. We denote n=|V| as the number of vertices. We use (V2) to denote the set of unordered pairs of vertices. Given two disjoint subsets V1,V2 of a graph G, we use G[V1,V2] to denote the induced bipartite subgraph where V1 and V2 are the two parts. Given a vertex u and a number i, we use N(u,i) to denote the i-th neighbor of vertex u.

We study the following three problems in this work in both versions where the output is an approximately optimal cost or a partition that yields an approximately optimal cost: the correlation clustering problem, the max cut problem, and the minimum bisection problem. The formal definition and statements to the latter two problems are deferred to the full version of the paper.

Correlation clustering

Given an undirected and unweighted graph G=(V,E). A clustering 𝒞 of the graph is a partition C1,C2,,Ck of its vertex set V where k is unfixed. A clustering also defines an equivalence between vertices: u𝒞v if u and v belong to the same cluster, and u≁𝒞v otherwise. The cost of a clustering is defined as the total number of missing edges inside each cluster plus the total number of edges crossing different clusters. Formally, we define it as

costG(𝒞):=|{(u,v)E:u𝒞v}|+|{(u,v)E:u≁𝒞v}|

Fix a parameter ε(0,1/2), the correlation clustering cost problem asks for an approximate value A such that |Amin𝒞costG(𝒞)|εn2 given G as the input. The correlation clustering partition problem asks for a clustering 𝒞 such that costG(𝒞)min𝒞costG(𝒞)+εn2.

A popular and equivalent definition of correlation clustering is based on a signed graph, where each edge is associated with a positive (+) or a negative () sign. The goal is to find a clustering that minimizes the number of disagreements, which corresponds to our cost function costG(𝒞). The problem can also be framed as maximizing the number of agreements. In the additive error setting, these two goals are equivalent, as the sum of agreements and disagreements is the fixed total number of pairs of vertices, (n2). In our paper, we model positive edges as existing edges and negative edges as non-edges, and we focus on the case of a complete signed input graph. Our lower bounds for this complete graph model also imply a lower bound for general graph cases.

3 Memory-query tradeoffs in the random query model

We study the memory-query tradeoffs of the correlation clustering cost problem in the random query model. This model was introduced in [36] and subsequently studied in [24]. Our main result is the first memory-query tradeoff lower bound for graph problems in the random query model. Different from the previous works, we adapt the proof structure from [31], which is based on Fourier analysis.

The random query model

In the random query model, an algorithm can make random queries to the input graph G=(V,E). Each query returns an independent and identically distributed (i.i.d.) uniformly random pair of vertices (u,v) from (V2) and an indicator 𝟙(u,v)E of whether or not (u,v)E. We denote the return of the i-th query as (ui,vi,ei), where ei=𝟙(ui,vi)E.

Theorem 1.

Let G=(V,E) be an undirected simple graph with n vertices. Let Π be any randomized algorithm that, in the random query model, approximates the correlation clustering cost of G to within an additive error of εn2 with probability at least 99/100. For this algorithm, if the worst-case query complexity is q and the space used is at most γn bits, then the following lower bound holds:

q={Ω(min(nε2γ,nnγ))if γ<1Ω(min(nε2γ2,nnγ))if γ1

for parameters ε(ω(1n),0.05) and γ(ω(lognn),o(n)].

We note that our lower bound, which holds for deterministic algorithms, also applies to randomized algorithms by a standard application of Yao’s minimax principle. In addition, the same lower bound applies to the max cut and minimum bisection problems. The three problems share the same hard distribution. We defer the lower bound statement and its proof to the latter two problems to the full version of this paper, and focus on the correlation clustering problem in this section.

Our proof extends the streaming lower bound for max cut in [31]. They showed an Ω~(n) space lower bound for approximating max cut with (2ϵ)-multiplicative error for any constant ϵ>0. We adapt their proof to establish lower bounds for a wide range of additive errors.

The core of our proof strategy is a reduction to two auxiliary problems. The first is a noisy variant of the distributional Boolean Hidden Partition (D-BHP) problem introduced in [26] and extended in [37, 31], and the second is what we call the same vector problem. The desired query-space lower bound follows from establishing lower bounds for these two problems. We will define these auxiliary problems formally in the following subsections.

3.1 Input distribution

We use an input distribution for the correlation clustering that differs from the one in [31]. While they studied the indistinguishability between sparse random graphs, we focus on the indistinguishability between two types of dense graphs: a random dense Erdős-Rényi graph and a random dense Erdős-Rényi graph with Θ(εn2) edges perturbed in expectation.

Let 𝒢n,p denote the distribution of Erdős-Rényi graphs with parameters n and p. In a random graph from this distribution, each of the (n2) possible edges is included with an independent probability p.

We define two input graph distributions, 𝒢Y and 𝒢N.

  • The YES distribution (𝒢Y): A random graph from 𝒢Y is sampled as follows. We first uniformly generate a partition P{0,1}n. For every pair of vertices (vi,vj), we independently include the edge (vi,vj) with probability 1/2+(1)PiPjρ, where ρ=10ε. For a fixed partition P, we denote this random graph distribution by 𝒢PY.

  • The NO distribution (𝒢N): This is simply the standard Erdős-Rényi graph distribution 𝒢n,1/2.

We will show later that the correlation clustering costs of graphs from these two distributions are 2εn2-far from each other with high probability. Therefore, the lower bound reduces to the indistinguishability of graphs from the two distributions 𝒢Y and 𝒢N.

We will also use the graph distribution 𝒢n,r to define the perturbed distributional boolean hidden partition (PD-BHP) problem: we uniformly sample r edges over all pairs of vertices, allowing for repetition; the graph is obtained by taking the union of the sampled edges, which is a graph of rr edges.

Distribution of Queries

We describe the distributions to the random queries the algorithm makes and their query answers. We define the distributions over query streams. Let a query stream be a sequence of t triples ((u1,v1,e1),,(ut,vt,et)), where each pair (ui,vi) is sampled uniformly from (V2).

  • 𝒟tY: The distribution of a stream of t random queries where the graph G=(V,E) is sampled from 𝒢Y, and each ei is set to 𝟙(ui,vi)E.

  • 𝒟tN: The distribution of a stream of t random queries where the graph G is sampled from 𝒢N.

Our lower bound applies to the mixed distribution 𝒟t=12𝒟tY+12𝒟tN.

The indistinguishability of the two distributions is established on the indistinguishability between the two distributions and intermediate distributions. Specifically, fix a number k1 where we assume k divides t. For every l{0,,k}, we let 𝒟t,l to denote the following distribution of query streams: (i) uniformly sample a partition P from the 2n possible partitions of the graph; (ii) let G1,,Gl𝒢PY, and let Gl+1,,Gk𝒢N independently; (iii) for each i[k] and every j=(i1)t/k+1,,it/k, we let ej=𝟙(uj,vj)E(Gi). Intuitively, we divide the input stream into k phases, where the samples of each phase is from an independently sampled graph G from the YES or NO distribution. Our proof idea is as follows

(1): Indistinguishable by lower bounding the same-vector problem
(2): Indistinguishable by lower bounding the PD-BHP problem
Figure 1: Structure of distributions used in the lower bound proof.

Given the distributions, we show that the correlation clustering cost of graphs in 𝒟Y and 𝒟N are 2εn2-far apart from each other with high probability.

Lemma 2.

Let ε(ω(1n),0.05). Then with probability 1nω(1) a random graph G drawn from 𝒢Y has correlation clustering cost at most n245εn2+O~(n); and a random graph G drawn from 𝒢N has correlation clustering cost at least n24O~(nn).

Proof.

We first lower bound the optimal clustering cost of a graph from 𝒢N=𝒢(n,1/2). Observe that for every clustering, every pair of vertices (u,v) contribute 1 cost with independent probability 1/2. Let X denote the cost of an arbitrary clustering. By the standard Chernoff bound,

PrG𝒢N[|X𝔼[X]|nnlogn]2exp(n3log2n3𝔼[X])2exp(43nlog2n)

Since there are at most nn<exp(nlogn) clusterings, by union bound, a random graph G drawn from 𝒢N has correlation clustering cost at least n24O(n3/2logn) with 1exp(nlogn) probability.

For graphs G drawn from 𝒢Y. By our construction to 𝒢Y, each pair of vertices (u,v) contribute 1 cost to costG(P) with independent probability 1/2ρ. Let X denote costG(P). Then 𝔼[X]=n245εn2O(n). By Chernoff bound,

PrG¯𝒢Y[|X𝔼[X]|nlogn]2exp(n2log2n3𝔼[X])2exp(43log2n)

3.2 The boolean hidden partition problem

We analyze the 2-party one-way communication complexity of the perturbed distributional Boolean hidden partition (PD-BHP) problem, a variant of the distributional Boolean hidden partition problem introduced in [31]. In this variant, the input vector is noisy.

Perturbed Distributional Boolean Hidden Partition Problem (PD-BHP)

In this problem, Alice receives a vector x{0,1}n, and Bob receives a graph G=(V,E) and a vector w{0,1}r, where r=|E|. Let M{0,1}r×n be the edge-vertex incidence matrix of G. The problem is to distinguish between two cases based on the relationship between x, w, and G:

  1. 1.

    YES case: w=Mx+Δ, where Δ is a random noise vector with each entry independently set to 1 with probability 1/2+ρ.

  2. 2.

    NO case: The vector w is uniformly random in {0,1}r, independent of x. This is equivalent to w=Mx+Δ, where Δ is a uniformly random vector.

Alice sends a message to Bob, who must distinguish between the two cases.

We analyze this problem under a specific distribution where Alice’s input x is uniformly random in {0,1}n, and Bob’s graph G is sampled from 𝒢n,r, a graph distribution defined by taking a union of r edges sampled over all pairs of vertices, allowing for repetition. The final answer is YES or NO with probability 1/2 each. As a result, the number of edges r in the resulting simple graph is exactly the number of distinct edges from the r samples, and there do not exist two identical rows in the edge-vertex incidence matrix M.

Lemma 3.

Fix three parameters ε(ω(1n),0.05), γ>ω(1nlogn) and α[ε2n,min(107γ2,104)]. Consider an instance of the PD-BHP problem where Alice receives a uniformly random string x{0,1}n, and Bob receives a graph G𝒢n,αn/ε2 together with the corresponding vector w. Any deterministic protocol for the PD-BHP problem that uses at most γn bits of communication can achieve an advantage of at most O((γ+α)α) over random guessing.

This lower bound is tight when both α and γ are constants. In this scenario, Alice can send the first Θ(n) bits of her input to Bob. In expectation, Bob can then learn Θ(1/ε2) entries of the vector Mx, and thus Θ(1/ε2) entries of the noise vector Δ. Since the noise vector in the YES case and the NO case correspond to Bernoulli distributions that are ρ-far apart, Θ(1/ρ2)=Θ(1/ε2) independent samples are sufficient to distinguish them with constant advantage.

This subsection focuses on proving the above lemma. Our proof follows the same framework as [26, 37, 31]. Compared to previous works, our problem involves a key trade-off: while the noise vector Δ makes distinguishing the distributions harder, the denser graphs make it easier. We establish a trade-off between the graph’s density, the noise probability, and the communication complexity.

Specifically, Alice’s message induces a partition A1,A2,,A2c of {0,1}n, where c is the number of bits sent by Alice. Let the protocol’s advantage over random guessing be at least τ, where τ:=(γ+α)α. Notice that at least a 1τ fraction of all input strings x{0,1}n are contained within sets in the partition that have a size of at least τ2nc. We can therefore focus our analysis on one such “typical” set, which we denote by A. Define a parameter c=c+log(1/τ), the size of A is |A|2nc.

The central idea is to show that if Alice’s input x is drawn uniformly from such a typical set A, the resulting vector Mx is statistically close to uniform. If this holds, Bob is unable to distinguish the YES case (where Δ=Mx+w) from the NO case (where Δ is uniformly random).

Our main technical contribution is an extension of Fourier analysis techniques from prior work [26, 37, 31]. While previous work dealt with sparse, cycle-free graphs [31], our analysis is adapted to handle the presence of cycles.

We use the following normalization of the Fourier transform:

f^(v)=12nx{0,1}nf(x)(1)xv

We use the following bounds on the Fourier mass of f contributed by coefficients of various weight:

Lemma 4 (Lemma 6 in [26]).

Let A{0,1}n of size at least 2nc and f its indicator function. Then for every l[4c]

22n|A|2v:|v|=lf^(v)2(42cl)l

Recall that G is the graph Bob receives as input. The number of edges in G is r where rαn/ε2. M{0,1}r×n denotes the incidence matrix of G where Meu=1 iff uV is an endpoint of e(V2). We are interested in the distribution of Mx+Δ where x is uniformly randomly selected from A and Δ is a random noise vector where each entry is 1 with probability 1/2+ρ. For z{0,1}r, let

pM(z)=xAPrΔ[Mx+Δ=z]|A|

Note that pM is a function of the message A. We will prove that pM(z) is close to uniform by bounding the sum of squares of its Fourier coefficients for non-zero weight vectors. Specifically, let Ur denote the uniform distribution over r-bit binary strings, then

pMUrtvd22rpMUr22=22rs{0,1}r,s0pM^(s)2

By expanding the Fourier coefficients of pM, we have
pM^(s)= 12rz{0,1}rpM(z)(1)zs = 1|A|2rΔ{0,1}rPr[Δ=Δ](|{xA:(Mx+Δ)s=0}||{xA:(Mx+Δ)s=1}|) = 1|A|2rΔ{0,1}rPr[Δ=Δ](|{xA:x(MTs)=(Δ)Ts}||{xA:x(MTs)=(Δ)Ts+1}|) = 1|A|2rΔ{0,1}rPr[Δ=Δ]xf(x)(1)x(MTs)(1)(Δ)Ts = 2nf^(MTs)|A|2rΔ{0,1}rPr[Δ=Δ](1)(Δ)Ts = 2nf^(MTs)|A|2r(2ρ)|s|

where the last equation is derived from the fact that the summation is exactly the expansion of the polynomial

((1/2+ρ)+(1/2ρ))r|s|((1/2ρ)(1/2+ρ))|s|=(2ρ)|s|

Therefore, we can rewrite the TV distance as

pMUrtvd2 22rs{0,1}r,s0pM^(s)2 (1)
= 22n|A|2s{0,1}r,s0f^(MTs)2(2ρ)2|s|
= 22n|A|2v{0,1}nf^(v)2s0(2ρ)2|s|𝟙v=MTs
= 22n|A|2l0v{0,1}n,|v|=lf^(v)2s0(2ρ)2|s|𝟙v=MTs

where the first inequality is by Cauchy-Schwartz, the second is Parseval’s equality. We will show that s0(2ρ)2|s|𝟙v=MTs is well-bounded for every v. Combined with Lemma 4, we get our desired bound.

If we regard s as an indicator of a subgraph of G, where each edge ei is selected if si=1, the summation represents the total weight of different vector s such that the parity of degrees of vertices in the subgraph induced by s is exactly the vector v.

Lemma 5.

Let M be the edge incidence matrix of a graph G drawn from 𝒢n,αn/ε2. For any vector v{0,1}n with Hamming weight l=|v|, the following bound on the expected value holds for parameters ε(ω(1n),0.05) and α[ε2n,104]:

𝔼M[s{0,1}r,s0(2ρ)2|s|𝟙v=MTs]{1010α3if l=0(104α/n)l/2if l>0

The above expectation bound holds because the weight ρ=10ε is small enough, and because r is upper bounded by αn/ε2.

Proof.

We will first bound the expectation for every fixed r. Notice that the graph distribution 𝒢n,αn/ε2 given the number of edges r fixed is uniform over the set of all r-edge simple graphs.

Now fix an arbitrary rαn/ε2. We bound the expectation of the summation by counting the number of matrices M such that v=MTs for every fixed vector s{0,1}r. Assume l to be an even number as otherwise the summation is always 0. Denote by GM,s the graph induced by M and s. Then the set of all edges in GM,s can be decomposed into l/2 walks, each walk either starts and finishes at the same (arbitrary) vertex, or starts and finishes at vertices where vi=1. There are exactly l/2 walks of the second type, which forms the vector v.

We first bound the probability that k random edges form a walk. Either the walk has determined start/end points, or it is a closed walk. For closed walks, this probability that k selected (and distinct) edges from M of a fixed order form a closed walk is at most

n(2.01/n)k1(2.01/n2)=2.01k/nk

The factor of n denotes the number of start vertices of the walk. At each step of the walk, with probability at most n1(n2)r2.01/n the next sampled edge shares an endpoint with the previous edge. The last step of the walk must go back to the start point, hence with probability 1(n2)2.01/n2 one samples this edge.

Analogously, for walks with fixed start and end vertices, the probability is at most

(2.01/n)k1(2.01/n2)=2.01k/nk+1

Therefore, for every fixed r, we have

𝔼M:M{0,1}r×n[s{0,1}r,s0(2ρ)2|s|𝟙v=MTs]
t={3if l=0l/2if l>0r(rt)(2ρ)2t2tt!2.01t/nt+l/2
= t={3if l=0l/2if l>0r(rt)t!1608tε2t/nt+l/2
t={3if l=0l/2if l>0rrt1608tε2t/nt+l/2

where the last inequality is due to the fact that (rt)t!rt. The factor of 2tt! accounts for ordering the t edges arbitrarily (t! ways) and independently partitioning it to multiple walks (2t1 ways).

Notice that for l=0, one needs to select at least 3 distinct edges to form v=0n since s0. The number of edges cannot be 2 because two edges do not form a cycle in a simple graph. For l>0, one needs at least l/2 edges. Note that the desired expectation is a weighted sum of the expectations for every fixed r. We have

𝔼M:GM𝒢n,αn/ε2[s{0,1}r,s0(2ρ)2|s|𝟙v=MTs]
maxrαn/ε2𝔼M:M{0,1}r×n[s{0,1}r,s0(2ρ)2|s|𝟙v=MTs]
maxrαn/ε2t={3if l=0l/2if l>0rrt1608tε2t/nt+l/2
t={3if l=0l/2if l>0(1608α)t/nl/2
{1010α3if l=0(104α/n)l/2if l>0

The sum is dominated by small t because it is a decreasing geometric sum when α104.

With Lemma 5, we are able to bound the statistical distance between pM and the uniform distribution.

Lemma 6.

Fix three parameters ε(ω(1n),0.05), γ>ω(1nlogn) and α[ε2n,min(107γ2,104)]. Let τ=(γ+α)α and c=γn+log(1/τ). Let A{0,1}n of size at least 2nc, and let f:{0,1}n{0,1} be the indicator of A. Let G be a random graph sampled from 𝒢n,αn/ε2. Then

𝔼M[pMUrtvd2]=O((α2+γ2)α)
Proof.

Recall that by inequality (1),

pMUrtvd222n|A|2l0v{0,1}n,|v|=lf^(v)2s0(2ρ)2|s|𝟙v=MTs

We break this summation into two parts: the part l[0,4c) and the part l[4c,n]. For the first part, by Lemma 4 and Lemma 5

𝔼M[22n|A|2l=0,l is even4c1v{0,1}n,|v|=lf^(v)2s0(2ρ)2|s|𝟙v=MTs]
1010α3+l=2,l is even4c1(42cl)l(104α/n)l/2
O(α3)+l=2,l is even(106γ2α/l2)l/2
= O((α2+γ2)α)

where 22n|A|2f^(0n)=1 and 106γ2α<0.1.

For the part l[4c,n], we have

𝔼M[22n|A|2l4cv{0,1}n,|v|=lf^(v)2s0(2ρ)2|s|𝟙v=MTs]
2c(104α/n)2c
= (2104α/n)2γn+O(logn)
= nω(logn)

where we used the fact that

22n|A|2v{0,1}n,|v|=lf^(v)222n|A|2v{0,1}nf^(v)2=2n|A|2c

Together with the part l[0,4c), we have

𝔼M[pMUrtvd2]=O((α2+γ2)α)

We will need the following lemma, which is the Lemma 5.6 from [31].

Lemma 7 (Lemma 5.6 of [31]).

Let (X,Y1),(X,Y2) be random variables taking values on finite sample space Ω=Ω1×Ω2. For any xΩ1 let Yxi, i=1,2 denote the conditional distribution of Yi given the event {X=x}. Then

(X,Y1)(X,Y2)tvd=𝔼X[YX1YX2tvd]

We now prove Lemma 3.

Proof of Lemma 3.

Let P(x) be Alice’s message function and let Q(M,i,w) be Bob’s function, where M is the edge incidence matrix of the input graph, w is input vector of Bob, and i is Alice’s message. Since we are analyzing the protocol under a fixed input distribution, we can assume P and Q are deterministic.

Let D1 denote the distribution of (M,P(x),w) under YES instances, and D2 the same under NO instances. We aim to show that

D1D2tvd=O((γ+α)α1/2),

which implies that the protocol’s distinguishing advantage is at most O((γ+α)α) on PD-BHP.

The function P(x) partitions {0,1}n into sets A1,,A2c, where c=γn is the number of bits communicated. Since there are 2c such sets, at least a 1τ fraction of {0,1}n is contained in those Ai of size at least τ2nc, where we define τ:=(α+γ)α. We call any message i with |Ai|τ2nc a typical message. Then the probability that i=P(x) is not typical on a uniformly random input x is at most τ.

Let us write D1=(M,i,pM,i), where pM,i is the conditional distribution of w given M and i under the YES distribution, and write D2=(M,i,Ur), where Ur is the uniform distribution over {0,1}r. For each M and i, let D(M,i)1:=pM,i and D(M,i)2:=Ur denote the respective conditional distributions of w.

Applying Lemma 7, we get:

D1D2tvd =𝔼i[𝔼M[D(M,i)1D(M,i)2tvd]]
Pr[i not typical]+𝔼M[pM,iUrtvd](i any typical message)
τ+𝔼M[pM,iUrtvd]

Now fix any typical message i. Then, by definition, |Ai|2nc where c=γn+log(1/τ). Lemma 6 then implies

𝔼M[pM,iUrtvd2]=O((γ2+α2)α).

Applying Jensen’s inequality yields

𝔼M[pM,iUrtvd]𝔼M[pM,iUrtvd2]=O((γ+α)α).

Combining this with the earlier bound, we conclude that

D1D2tvd=O((γ+α)α),

as claimed.

3.3 Same vector problem

We prove a query-space lower bound for the following problem in the random query model.

Same vector problem

Let N=(n2), p[0,1] and k=o(N). Let q<N divides N. Let X,Y1,,Yk be binary vectors drawn from Bern(p)N independently, i.e., they are length-N vectors with each bit being 1 with probability p. An algorithm for this problem has k phases. At each phase k[k], the algorithm makes q random queries. The return of each random query is a pair (It,WIt) where ItR[N] independently. For WIt, (i) in the YES case, WIt=XIt; (ii) in the NO case, WIt=Yk,It at phase k. The algorithm must distinguish between the two cases above.

When k>1, a trivial upper bound is to memorize O~(N/kq) queries from the first kq/2 queries, and check if there is any collision with the latter kq/2 samples. We show that the upper bound is tight using information theory. Formally, we obtain the following lower bound.

Lemma 8.

Fix k1 and q2n. Fix p[0,1] to be a function of n. For every algorithm that solves the same vector problem correctly with probability 2/3, the algorithm must use Ω(N/kq) bits of memory. Specifically, this is true when the algorithm outputs correctly w.p. 2/3 on an input distribution where the input is in the YES case and the NO case uniformly.

We will need to use Fano’s inequality and Shearer’s lemma in the proof.

Proposition 9 (Fano’s inequality).

Let XYX~ be a Markov chain, and let pe=Pr[X~X]. Let H(x)=xlogx(1x)log(1x) denote the binary entropy function. Let 𝒳 be the support of X. Then

H(pe)+pelog(|𝒳|1)H(X|Y)
Proposition 10 (Shearer’s lemma [19]).

Let X1,,Xn be random variables, and let S1,,Sm[n] be subsets such that each i[n] belongs toat least k sets. Then

kH(X1,,Xn)j=1mH(Xi:iS)
Proof to Lemma 8.

We will prove the lower bound to any algorithm over the distribution where the input is in the YES case and the NO case uniformly.

Let c denote the length of the memory. For every j[kq], let Mj{0,1}c denote the memory after receiving the j-th random query. Specifically, M0=0N denotes the initial memory.

Without loss of generality assume that Mkq is the output of the algorithm. Let D be an indicator such that D=1 at the YES case, and D=0 at the NO case. Let DMkqMkq be the Markov chain. By Fano’s inequality and pe1/3, we get

0.92H(1/3)H(pe)H(D|Mkq)=1I(D;Mkq) (2)

We will decompose the mutual information I(D;Mkq) into the sum of increases of mutual information in each phase. Formally, we denote by I(t) the list of random variables It(q1)+1,,I(tq) in phase t, and we use the same notation for other variables.

I(D;Mkq) (3)
= t=1kI(D;Mtq)I(D;M(t1)q)
t=1kI(D;M(t1)q,I(t),WI(t))I(D;M(t1)q)
= t=1kI(D;I(t),WI(t)|M(t1)q)
= t=1kI(D;WI(t)|I(t),M(t1)q)

by the data processing inequality.

We will upper bound each I(D;WI(t)|I(t),M(t1)q) by I(M(t1)q;XI(t)|D,I(t)). For every t,m and i(t), let γt,m,i(t)=Pr[D=0|Mt1=m,It=i(t)]. We can rewrite

t=1kI(D;WI(t)|I(t),M(t1)q)
= t=1kH(WI(t)|I(t),M(t1)q)H(WI(t)|D,I(t),M(t1)q)
t=1kH(WI(t)|I(t))𝔼m,i(t)[H(WI(t)|D,I(t)=i(t),M(t1)q=m)]
= t=1kH(WI(t)|I(t))𝔼m,i(t)[γt,m,iH(WI(t)|D=0,I(t)=i(t),M(t1)q=m)+
(1γt,m,i)H(WI(t)|D=1,I(t)=i(t),M(t1)q=m)]

where the inequality follows from the fact that conditioning reduces entropy.

And we can rewrite the above as the summation of I(M(t1)q;XI(t)|D,I(t)):

t=1kI(M(t1)q;XI(t)|D,I(t)) (4)
= t=1kH(XI(t)|D,I(t))H(XI(t)|D,M(t1)q,I(t))
= t=1kH(XI(t)|I(t))𝔼m,i(t)[H(XI(t)|D,M(t1)q=m,I(t)=i(t))]
= t=1kH(WI(t)|I(t))𝔼m,i(t)[γt,m,iH(XI(t)|D=0,I(t)=i(t),M(t1)q=m)+
(1γt,m,i)H(XI(t)|D=1,I(t)=i(t),M(t1)q=m)]
= t=1kH(WI(t)|I(t))𝔼m,i(t)[γt,m,iH(WI(t)|D=0,I(t)=i(t),M(t1)q=m)+
(1γt,m,i)H(WI(t)|D=1,I(t)=i(t),M(t1)q=m)]
t=1kI(D;WI(t)|I(t),M(t1)q)

We are able to replace the conditional entropy of XI(t) by WI(t) regardless of whether the answer is YES or NO. When D=1 (the YES case), WI(t)=XI(t) by our definition. When D=0 (the NO case), WI(t)=Yt,I(t) is independent from previous samples, and follows the same (conditional) distribution as XI(t).

Let S[N]q denote a random subset S[N] to be the range of a random vector from [N]q. We have
I(M(t1)q;XI(t)|D,I(t)) =12I(M(t1)q;XI(t)|D=1,I(t)) =12𝔼S[N]q[I(M(t1)q;XS|D=1)] =12𝔼S[N]q[H(XS|D=1)H(XS|D=1,M(t1)q)] =12𝔼S[N]q[H(XS|D=1)]12𝔼S[N]q[H(XS|D=1,M(t1)q)] =12𝔼S[N]q[iSH(Xi|D=1)]12𝔼S[N]q[H(XS|D=1,M(t1)q)] (5)

Let μ=PrS[N]q[iS],i[N]. We have μ=1(11/N)qq/N. Then, by Shearer’s Lemma (Proposition 10),

𝔼SI(t)[H(XS|D=1,M(t1)q)]μH(X|D=1,M(t1)q).

And,

𝔼SI(t)[iSH(Xi|D=1)] =i[N]PrS[N]q[iS]H(Xi|D=1)
=μ(i[N]H(Xi|D=1))
=μH(X|D=1).

Substituting in Equation (5), we get

I(M(t1)q;XI(t)|D=1,I(t)) μ2H(X|D=1)μ2H(X|D=1,M(t1)q)
=μ2I(M(t1)q,X|D=1)
cq2N.

Combined with inequalities (2), (3), (4), we obtain
0.921I(D;Mkq)1t=1kI(D;WI(t)|I(t),M(t1)q)1t=1kI(M(t1)q;XI(t)|D,I(t))1ckq2N

and we get a space lower bound c0.16Nkq.

3.4 Proof to the main theorem

In this section, we prove the main theorem, which is a query-space lower bound to approximating the max cut in the random query model. We repeat the theorem here.

Theorem 1. [Restated, see original statement.]

Let G=(V,E) be an undirected simple graph with n vertices. Let Π be any randomized algorithm that, in the random query model, approximates the correlation clustering cost of G to within an additive error of εn2 with probability at least 99/100. For this algorithm, if the worst-case query complexity is q and the space used is at most γn bits, then the following lower bound holds:

q={Ω(min(nε2γ,nnγ))if γ<1Ω(min(nε2γ2,nnγ))if γ1

for parameters ε(ω(1n),0.05) and γ(ω(lognn),o(n)].

The lower bound is obtained through a reduction from both the same-vector problem and the PD-BHP problem to the max cut problem. Our reduction follows a similar structure to [31] despite adding a reduction to the same vector problem. This reduction is necessary in our case. We start from introducing the reduction used in [31]. Then we outline how we adapt this reduction to obtain our result. After presenting a full map, we will show the formal reduction.

In [31], the main result is a Ω~(n) space lower bound for (2ε)-approximating the max cut given a stream of edges of the input graph. To that end, they show that every algorithm with limited space cannot distinguish a bipartite graph from an Erdös-Rényi graph with the same expected number of edges. The lower bound is also achieved by a reduction to the D-BHP problem, which is similar to the PD-BHP problem we study in this work but (i) the noise vector is always 0n; (ii) the number of edges given to Bob is Θ(n/k) where k=Θ(logn) is the number of phases. Besides, their total number of edges over all k phases in the stream is only Θ(n).

Their reduction has two steps. First, they show that a stream of k independently sampled random graphs of Θ(n/k) edges is close in total variation to a stream of the edges of a random graph of Θ(n) edge. This holds both when the graph is sampled from an Erdös-Rényi distribution or when the graph is a random bipartite graph given a fixed random partition. It is true because the total number of edges is Θ(n) and it is unlikely an edge will be sampled twice in two different phases. Given the first step, we only need to show that algorithms cannot distinguish between a stream of k phases of random sparse bipartite graphs given and a stream of k phases of random sparse Erdos-Renyi graphs.

The next step is the standard hybrid argument. For every i{0,,k}, let 𝒟(i)Y denote the input distribution where the first i phases are random sparse bipartite graphs given a fixed random partition, and the last ki phases are sparse Erdös-Rényi graphs. By the hybrid argument, if there is an algorithm that can distinguish the two cases with constant advantage, there also exists an index i[k] such that the algorithm can distinguish the two distributions 𝒟(i1)Y and 𝒟(i)Y with Ω(1/k) advantage. Such an algorithm Π can be used to device an algorithm for the D-BHP problem: Alice, given a partition as input, samples i1 sparse graphs of Θ(n/k) edges and runs Π on it locally. Then, Alice sends the content of the space of Π to Bob. Bob sets the graph of the i-th phase to be its input graph, and the last ni phases to be Erdös-Rényi graphs. Bob runs Π on the given space and the last ni+1 phases, and outputs the output of Π. Finally, by the lower bound to D-BHP, the streaming algorithm also requires Ω~(n) space.

In our case, however, it is impossible to show that a stream of random samples of pairs of vertices to the graph is statistically indistinguishable from a stream of k phases of independent random graphs of 1/k size. This is because each phase has ω(n) pairs of vertices in our case. Therefore, almost certainly many pairs of vertices will be sampled twice and cause a conflict (i.e., a pair of vertices is connected in one phase, but disconnected in another). To that end, we use our lower bound established above for the same-vector problem to show that every algorithm in the random query model with bounded time and space cannot distinguish a stream of consistent random queries of a fixed random graph from a stream of random queries that consists of k phases of independent random graphs. Given that, we still use the hybrid argument to reduce the streaming lower bound to the communication lower bound for the PD-BHP problem studied in Section 3.2. We refer readers to Figure 1 for an illustration to our reductions.

Proof to Theorem 1.

Suppose for the sake of contradiction that there is a randomized algorithm in the random query model that makes t queries to the input graph, uses γn bits of memory, and approximates the correlation clustering cost to within additive error εn2 with probability at least 99/100. Here we let

t:=C{Ω(min(nε2γ,nnγ))if γ<1Ω(min(nε2γ2,nnγ))if γ1

where C>0 is a small enough constant. By Yao’s minimax principle, there also exists a deterministic algorithm Π with the same query complexity and space complexity such that Π approximates the correlation clustering cost given a random input from 𝒟t=12𝒟tY+12𝒟tN with 99/100 probability. Let Π be the algorithm that outputs 1 if and only if the output of Π is less than n242.5εn2. By Lemma 2, Π is a distinguisher for input distributions 𝒟tY and 𝒟tN with high advantage. Formally, we have

PrS𝒟tY[Π(S)=1]99/100nω(1) (6)

and

PrS𝒟tN[Π(S)=1]1/100+nω(1) (7)

where S denotes the input stream.

Let k=max(1γγ,1). Without loss of generality, we assume k divides t. We obtain that

PrS𝒟t,k[Π(S)=1]0.6

Otherwise, by (6), Π solves the same-vector problem correctly with probability 0.695nω(1) on the distribution 12𝒟tY+12𝒟t,k, which contradicts Lemma 8 since Ntγn/C. Analogously, by (7) and Lemma 8, we have

PrS𝒟t,0[Π(S)=1]0.4

By the standard hybrid argument, there exists a number i[k] such that Π distinguishes between 𝒟t,i1 and 𝒟t,i. Formally, if we let pi:=PrS𝒟t,i[Π(S)=1] for every i=0,,k, we have

maxi[k]|pipi1|1kik|pipi1|1k|pkp0|(0.60.4)/k=15k

Given such an algorithm Π and index i, we can obtain an algorithm for the PD-BHP problem with 110k advantage in the communication model.

Formally, the algorithm works as follows: when Alice receives a partition P{0,1}n as input, she independently samples a sequence of i1 graphs G1,,Gi1 drawn from 𝒢PY locally. Alice then runs the algorithm Π on i1kt random queries to these graphs. After that, she sends the content of the space of Π to Bob. Bob receives the space of Π, the input graph G and the input vector w. He samples a sequence of t/k triples ((u1,v1,e1),,(ut/k,vt/k,et/k)) on the marginal distribution that a random graph from 𝒢n,t/k is exactly G, where each ei is the the corresponding bit in w. After this, Bob samples a sequence of ki graphs Gi+1,,Gk from 𝒢N and correspondingly kikt random queries. Bob then simulates Π on these random queries with its initial content of the space set to the message of Alice. Finally, Bob outputs the output of Π.

To see its correctness, notice that the input distribution of Bob is exactly the distribution of subgraphs induced by the i-th phase of the random queries. On the YES case, Alice and Bob together simulates Π on the input distribution 𝒟t,k, whereas on the NO case, they simulates Π on 𝒟t,k1. Hence the above algorithm distinguishes YES cases from NO cases with advantage 110k. However, when γ<1, by applying Lemma 3 and setting α=tkn/ε2Cγ, the advantage should be (γ+α)αCγγ, which is smaller than 110k=0.1γγ when C is small enough. When γ1, by setting α=tkn/ε2Cγ2, the advantage should be (γ+α)αC<110k=0.1. Here C is a constant that is decreasing when C tends to 0. Therefore, we get a contradiction.

References

  • [1] Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):1–27, 2008. doi:10.1145/1411509.1411513.
  • [2] Noga Alon, Tali Kaufman, Michael Krivelevich, and Dana Ron. Testing triangle-freeness in general graphs. SIAM Journal on Discrete Mathematics, 22(2):786–819, 2008. doi:10.1137/07067917X.
  • [3] Sepehr Assadi, Vaggos Chatziafratis, Jakub Lacki, Vahab Mirrokni, and Chen Wang. Hierarchical clustering in graph streams: Single-pass algorithms and space lower bounds. In Po-Ling Loh and Maxim Raginsky, editors, Conference on Learning Theory, 2-5 July 2022, London, UK, volume 178 of Proceedings of Machine Learning Research, pages 4643–4702. PMLR, PMLR, 2022. URL: https://proceedings.mlr.press/v178/assadi22a.html.
  • [4] Sepehr Assadi, Vihan Shah, and Chen Wang. Streaming algorithms and lower bounds for estimating correlation clustering cost. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, volume 36, pages 75201–75213, 2023. URL: http://papers.nips.cc/paper_files/paper/2023/hash/ee1a1ecc92f35702b5c29dad3dc909ea-Abstract-Conference.html.
  • [5] Sepehr Assadi and Chen Wang. Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions. In Mark Braverman, editor, 13th Innovations in Theoretical Computer Science Conference, ITCS 2022, Berkeley, CA, USA, January 31 - February 3, 2022, volume 215 of LIPIcs, pages 10:1–10:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.ITCS.2022.10.
  • [6] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine Learning, 56:89–113, 2004. doi:10.1023/B:MACH.0000033116.57574.95.
  • [7] Soheil Behnezhad, Moses Charikar, Weiyun Ma, and Li-Yang Tan. Almost 3-approximate correlation clustering in constant rounds. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 720–731. IEEE, 2022. doi:10.1109/FOCS54457.2022.00074.
  • [8] Soheil Behnezhad, Moses Charikar, Weiyun Ma, and Li-Yang Tan. Single-pass streaming algorithms for correlation clustering. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 819–849. SIAM, 2023. doi:10.1137/1.9781611977554.ch33.
  • [9] Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Cliff Stein, and Madhu Sudan. Fully dynamic maximal independent set with polylogarithmic update time. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 382–405. IEEE, 2019. doi:10.1109/FOCS.2019.00032.
  • [10] Aditya Bhaskara, Samira Daruki, and Suresh Venkatasubramanian. Sublinear algorithms for MAXCUT and correlation clustering. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, Prague, Czech Republic, July 9-13, 2018, volume 107 of LIPIcs, pages 16:1–16:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPIcs.ICALP.2018.16.
  • [11] Marco Bressan, Nicolò Cesa-Bianchi, Andrea Paudice, and Fabio Vitale. Correlation clustering with adaptive similarity queries. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, volume 32, pages 12510–12519, 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/b0ba5c44aaf65f6ca34cf116e6d82ebf-Abstract.html.
  • [12] Mélanie Cambus, Fabian Kuhn, Etna Lindy, Shreyas Pai, and Jara Uitto. A (3+ε)-approximate correlation clustering algorithm in dynamic streams. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2861–2880. SIAM, 2024. doi:10.1137/1.9781611977912.101.
  • [13] Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, David Rasmussen Lolck, Alantha Newman, Mikkel Thorup, Lukas Vogl, Shuyi Yan, and Hanwen Zhang. Solving the correlation cluster lp in sublinear time. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing (STOC), pages 1154–1165, 2025. doi:10.1145/3717823.3718181.
  • [14] Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, Alantha Newman, and Lukas Vogl. Understanding the cluster linear program for correlation clustering. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing (STOC), pages 1605–1616, 2024. doi:10.1145/3618260.3649749.
  • [15] Nairen Cao, Shang-En Huang, and Hsin-Hao Su. Breaking 3-factor approximation for correlation clustering in polylogarithmic rounds. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 4124–4154. SIAM, 2024. doi:10.1137/1.9781611977912.143.
  • [16] Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360–383, 2005. doi:10.1016/j.jcss.2004.10.012.
  • [17] Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. Near optimal lp rounding algorithm for correlationclustering on complete and complete k-partite graphs. In Proceedings of the 47th annual ACM Symposium on Theory of Computing (STOC), pages 219–228, 2015. doi:10.1145/2746539.2746604.
  • [18] Chi-Ning Chou, Alexander Golovnev, and Santhoshini Velusamy. Optimal streaming approximations for all boolean max-2csps and max-ksat. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 330–341. IEEE, 2020. doi:10.1109/FOCS46700.2020.00039.
  • [19] Fan RK Chung, Ronald L Graham, Peter Frankl, and James B Shearer. Some intersection theorems for ordered sets and graphs. Journal of Combinatorial Theory, Series A, 43(1):23–37, 1986. doi:10.1016/0097-3165(86)90019-1.
  • [20] Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, and Jakub Tarnawski. Correlation clustering in constant many parallel rounds. In International Conference on Machine Learning (ICML), pages 2069–2078. PMLR, 2021. URL: http://proceedings.mlr.press/v139/cohen-addad21b.html.
  • [21] Vincent Cohen-Addad, Euiwoong Lee, Shi Li, and Alantha Newman. Handling correlated rounding error via preclustering: A 1.73-approximation for correlation clustering. In IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 1082–1104. IEEE, 2023. doi:10.1109/FOCS57990.2023.00065.
  • [22] Vincent Cohen-Addad, Euiwoong Lee, and Alantha Newman. Correlation clustering with sherali-adams. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 651–661. IEEE, 2022. doi:10.1109/FOCS54457.2022.00068.
  • [23] Vincent Cohen-Addad, David Rasmussen Lolck, Marcin Pilipczuk, Mikkel Thorup, Shuyi Yan, and Hanwen Zhang. Combinatorial correlation clustering. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing (STOC), pages 1617–1628, 2024. doi:10.1145/3618260.3649712.
  • [24] Itai Dinur. Time-space lower bounds for bounded-error computation in the random-query model. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2900–2915. SIAM, 2024. doi:10.1137/1.9781611977912.103.
  • [25] Yumou Fei, Dor Minzer, and Shuo Wang. Multi-pass streaming lower bounds for approximating max-cut. 66th IEEE Symposium on Foundations of Computer Science (FOCS) 2025, 2025.
  • [26] Dmitry Gavinsky, Julia Kempe, Iordanis Kerenidis, Ran Raz, and Ronald De Wolf. Exponential separations for one-way quantum communication complexity, with applications to cryptography. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC), pages 516–525, 2007. doi:10.1145/1250790.1250866.
  • [27] Venkatesan Guruswami and Runzhou Tao. Streaming hardness of unique games. In Dimitris Achlioptas and László A. Végh, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2019, Massachusetts Institute of Technology, Cambridge, MA, USA, September 20-22, 2019, volume 145 of LIPIcs, pages 5:1–5:12. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPIcs.APPROX-RANDOM.2019.5.
  • [28] Venkatesan Guruswami, Ameya Velingker, and Santhoshini Velusamy. Streaming complexity of approximating max 2csp and max acyclic subgraph. In Klaus Jansen, José D. P. Rolim, David Williamson, and Santosh S. Vempala, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2017, Berkeley, CA, USA, August 16-18, 2017, volume 81 of LIPIcs, pages 8:1–8:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPIcs.APPROX-RANDOM.2017.8.
  • [29] John Kallaugher, Michael Kapralov, and Eric Price. The sketching complexity of graph and hypergraph counting. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 556–567. IEEE, 2018. doi:10.1109/FOCS.2018.00059.
  • [30] Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Approximating matching size from random streams. In Proceedings of the 25th annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 734–751. SIAM, 2014. doi:10.1137/1.9781611973402.55.
  • [31] Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Streaming lower bounds for approximating MAX-CUT. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 1263–1282. SIAM, SIAM, 2015. doi:10.1137/1.9781611973730.84.
  • [32] Michael Kapralov, Sanjeev Khanna, Madhu Sudan, and Ameya Velingker. (1+ ω (1))-approximation to max-cut requires linear space. In Philip N. Klein, editor, Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 1703–1722. SIAM, SIAM, 2017. doi:10.1137/1.9781611974782.112.
  • [33] Tali Kaufman, Michael Krivelevich, and Dana Ron. Tight bounds for testing bipartiteness in general graphs. SIAM Journal on Computing, 33(6):1441–1483, 2004. doi:10.1137/S0097539703436424.
  • [34] Dmitry Kogan and Robert Krauthgamer. Sketching cuts in graphs and hypergraphs. In 6th Conference on Innovations in Theoretical Computer Science (ITCS), pages 367–376, 2015. doi:10.1145/2688073.2688093.
  • [35] Konstantin Makarychev and Sayak Chakrabarty. Single-pass pivot algorithm for correlation clustering. keep it simple! In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, volume 36, pages 6412–6421, 2023. URL: http://papers.nips.cc/paper_files/paper/2023/hash/149ad6e32c08b73a3ecc3d11977fcc47-Abstract-Conference.html.
  • [36] Ran Raz and Wei Zhan. The random-query model and the memory-bounded coupon collector. In Thomas Vidick, editor, 11th Innovations in Theoretical Computer Science Conference, ITCS 2020, Seattle, Washington, USA, January 12-14, 2020, volume 151 of LIPIcs, pages 20:1–20:11. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPIcs.ITCS.2020.20.
  • [37] Elad Verbin and Wei Yu. The streaming complexity of cycle counting, sorting by reversals, and other problems. In Proceedings of the 22nd annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 11–25. SIAM, 2011. doi:10.1137/1.9781611973082.2.