Abstract 1 Introduction 2 Preliminaries 3 Simultaneous 𝑶(𝟏)-Approximation for Top-𝒌 Norms References Appendix A Reduction from All Monotone Symmetric Norms to Top-𝒌 Norms Appendix B Implementing Algorithm 1 in Nearly Linear Time Appendix C Rounding Appendix D A Constant Round MPC Algorithm

Simultaneously Approximating All Norms for Massively Parallel Correlation Clustering

Nairen Cao ORCID Department of Computer Science and Engineering, New York University, NY, USA Shi Li ORCID School of Computer Science, Nanjing University, China Jia Ye ORCID School of Computer Science, Nanjing University, China
Abstract

We revisit the simultaneous approximation model for the correlation clustering problem introduced by Davies, Moseley, and Newman [21]. The objective is to find a clustering that minimizes given norms of the disagreement vector over all vertices.

We present an efficient algorithm that produces a clustering that is simultaneously a 63.3-approximation for all monotone symmetric norms. This significantly improves upon the previous approximation ratio of 6348 due to Davies, Moseley, and Newman [21], which works only for p-norms.

To achieve this result, we first reduce the problem to approximating all top-k norms simultaneously, using the connection between monotone symmetric norms and top-k norms established by Chakrabarty and Swamy [11]. Then we develop a novel procedure that constructs a 12.66-approximate fractional clustering for all top-k norms. Our 63.3-approximation ratio is obtained by combining this with the 5-approximate rounding algorithm by Kalhan, Makarychev, and Zhou [26].

We then demonstrate that with a loss of ϵ in the approximation ratio, the algorithm can be adapted to run in nearly linear time and in the MPC (massively parallel computation) model with poly-logarithmic number of rounds.

By allowing a further trade-off in the approximation ratio to (359+ϵ), the number of MPC rounds can be reduced to a constant.

Keywords and phrases:
Correlation Clustering, All-Norms, Approximation Algorithm, Massively Parallel Algorithm
Category:
Track A: Algorithms, Complexity and Games
Funding:
Nairen Cao: Supported by NSF grant CCF-2008422.
Shi Li: Supported by the State Key Laboratory for Novel Software Technology, and the New Cornerstone Science Laboratory.
Jia Ye: Supported by the State Key Laboratory for Novel Software Technology, and the New Cornerstone Science Laboratory.
Copyright and License:
[Uncaptioned image] © Nairen Cao, Shi Li, and Jia Ye; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Approximation algorithms analysis
; Theory of computation Massively parallel algorithms
Related Version:
Full Version: https://arxiv.org/abs/2410.09321
Editors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

1 Introduction

Clustering is a classic problem in unsupervised machine learning. It aims to classify a given set of data elements based on their similarities, with the goal of maximizing the similarity between elements within the same class and minimizing the similarity between elements in different classes. Among the various graph clustering problems, correlation clustering stands out as a classic model. Initially proposed by Bansal, Blum, and Chawla [4], the model has numerous practical applications, including automated labelling [10, 1], community detection and mining [15, 31, 30], and disambiguation task [25], among others.

The input of the standard correlation clustering problem is a complete graph over a set V of n vertices, where edges are partitioned into the set E+ of +edges and the set E of edges. The output of the problem is a clustering (or a partition) 𝒞 of V that minimizes the number of edges in disagreement: an edge uv(V2) is in disagreement if uvE+ but u and v are in different clusters in 𝒞, or uvE but u and v are in a same cluster in 𝒞. Throughout the paper, we shall use a graph G=(V,E) to denote a correlation clustering instance, with E being E+ and (V2)E being E.

This problem is known to be APX-Hard [13]. There has been a long stream of O(1)-approximation algorithms for the problem [4, 13, 2, 14, 19, 18, 8], with the current best approximation ratio being 1.437 [8]. In the same paper, the authors presented an improved hardness of 24/23 for the problem, which also made the constant explicit.

Besides the standard setting, other objectives have been studied recently, with the goal of minimizing some norm of the disagreement vector of the clustering 𝒞 over vertices. For a clustering 𝒞 of V, the disagreement vector of 𝒞 is defined as cost𝒞0n, where cost𝒞(u) for every uV is the number of edges incident to u that are in disagreement with respect to 𝒞. Given some norm f:0n0 111This means f satisfies f(αx)=αf(x) for every real α0 and x0n, and f(x+y)f(x)+f(y) for every x,y0n, the goal of the problem is to minimize f(cost𝒞). Notice that the standard correlation clustering problem corresponds to the case where f is the 1 norm.

Puleo and Milenkovic [29] initiated the study of correlation clustering with the goal of minimizing the p norm of the disagreement vector, where p[1,]. They proved that the problem is NP-hard for the -norm objective. Given a fixed p[1,], for the p-norms objective, they gave a 48-approximation algorithm. The approximation ratio was subsequently improved by Charikar, Gupta, and Schwartz [12] to 7 for the -norm, and by Kalhan, Makarychev and Zhou [26] to 5 for the p-norm with any fixed p[1,]. Very recently, Heidrich, Irmai, and Andres [24] improved the approximate ratio to 4 for the -norm.

Davies, Moseley and Newman [21] introduced the concept of simultaneous approximation for all p-norms. They developed an efficient algorithm that outputs a single clustering 𝒞, which is simultaneously an O(1)-approximation for the p norm for all p[1,]. This is rather surprising, as it was not known a priori whether such a clustering 𝒞 even exists. To achieve the goal, they first construct a fractional clustering x that is simultaneously an O(1)-approximation for all p norms and then use the 5-approximate rounding algorithm of Kalhan, Makarychev, and Zhou [26] to round x into an integral clustering 𝒞. Crucially, the algorithm of [26] guarantees a per-vertex 5-approximation, meaning that cost𝒞(u) is at most 5 times the fractional number of edges in disagreement incident to u, for every uV. This strong property is used to obtain the final simultaneous O(1)-approximation in [21].

In light of the growing networks, it is imperative to develop efficient parallel algorithms. This urgency is particularly pronounced in machine learning and data mining applications, where timely and efficient processing is essential for extracting meaningful insights from vast datasets. Many works in the literature aim to design efficient parallel algorithms [5, 16, 28, 23, 6, 17, 3, 7, 9]. The MPC model, as a theoretical abstraction of several real-world parallel models such as MapReduce [22], is a prevalent methodology employed in these works.

1.1 Our results

In this paper, we revisit and generalize the simultaneous approximation model for the correlation clustering that was introduced by [21]. Instead of considering only p norms, we consider all monotone symmetric norms. We say a norm f:0n0 is monotone if for every x,y0n with xy, we have f(x)f(y). We say f is symmetric if f(x)=f(x) for every x,x0n such that x is a permutation of x. Such norms were considered in [11] in the context of load balancing and clustering. Our first result is that there exists simultaneous O(1)-approximation for all monotone symmetric norms for correlation clustering and it can be constructed in polynomial time.

Definition 1.

Given a correlation clustering instance G=(V,E) and α1, we say a clustering 𝒞 over V is simultaneously α-approximate, or a simultaneous α-approximation, for a family F of norms, if we have f(cost𝒞)αf(costOPTf) for every fF, where OPTf is the optimum clustering for G under norm f.

Theorem 2.

Given a correlation clustering instance G=(V,E), in polynomial time we can construct a simultaneous 63.3-approximate clustering 𝒞 for the family of monotone symmetric norms.

Next, we are concerned with the running time of the algorithm and its implementation under the MPC model. To state the result, we need a formal description of the MPC model.

The MPC model.

In the MPC model, data is distributed across a set of machines, and computation proceeds in synchronous rounds. During each round, each machine first receives messages from other machines, then performs computations based on this information and its own allocated memory, and finally sends messages to other machines to be received at the start of the next round. Each machine has limited local memory, restricting the total number of messages it can receive or send in a round. The efficiency of the algorithm is measured by the number of rounds, the memory used by each machine, the total memory used by all machines, and the running time over all machines, also known as the total work.

In this paper, we consider the MPC model in the strictly sublinear regime: Each machine has O(nδ) local memory, where n is the input size and δ>0 is a constant that can be made arbitrarily small. Under this model, we assume the input received by each machine has size O(nδ).

We then describe the correlation clustering problem under the MPC model in the strictly sublinear regime. We use n=|V| and m=|E| to denote the number of vertices and edges respectively in the input graph G=(V,E). The edges E are distributed across the machines, where each machine has O(nδ) memory for a constant δ>0 which can be made arbitrarily small. At the end of the algorithm, each machine needs to store in its local memory the IDs of the clusters for all the vertices incident to its assigned edges.

Our main result regarding MPC algorithm is given as follows,

Theorem 3.

Let ϵ(0,1). There exists a randomized MPC algorithm in the strictly sublinear regime that, given a correlation clustering instance G=(V,E), in O(log3n) rounds outputs a simultaneous (63.3+O(ϵ)) clustering for G for all monotone symmetric norms. This algorithm succeeds with high probability. It uses O~(m/ϵ6) total memory and O~(m/ϵ6) total work.222As usual, we use O~() to hide a poly-logarithmic factor in the input size.

In particular, the algorithm can be converted into a nearly linear time algorithm that with high probability outputs a (63.3+O(ϵ))-simultaneous approximation for all monotone symmetric norms.

Along the way, we develop an MPC rounding algorithm with a per-vertex (5+55ϵ) approximation guarantee, based on the sequential algorithm due to [26]. Given its potential independent interest, we state it here for future references.

Theorem 4.

Let ϵ(0,1) be a constant. Given a graph G=(V,E) and a set of LP values (xuv)u,vV satisfying the approximate triangle inequality, that is, for any u,v,wV, we have xuv+xuw+ϵxvw. Let yu=uvExuv+uv(V2)E(1xuv) be the LP disagreement for node u. There exists an MPC algorithm that computes a clustering 𝒞 such that for any node u, we have

cost𝒞(u)(5+55ϵ)yu.

This algorithm always succeeds but terminates in O(log3n/ϵ) rounds with high probability and requires O(nδ) memory per machine. Moreover, let K=E{uv(V2)Exuv<1} be the set of +edges and edges whose LP value is less than 1. The algorithm uses a total memory of O(|K|logn) and a total work of O(|K|log3n/ϵ).

The O(log3n) round in the above theorem might not be desirable for many applications. Our next result shows that we can reduce the number of rounds to O(1), albeit with a worse O(1) approximation ratio:

Theorem 5.

Let ϵ(0,1) be a constant. There exists a randomized MPC algorithm in the strictly sublinear regime that, given a correlation clustering instance G=(V,E), in 𝐎(𝟏) rounds outputs a clustering that is simultaneously a (359+ϵ)-approximation, for all monotone symmetric norms. This algorithm succeeds with high probability, and uses a total memory of O~(m/ϵ2) and a total work of O~(m/ϵ2).

Overall, relative to [21], our algorithms demonstrate the following improvements.

  1. 1.

    We generalize the family of norms for the simultaneous approximation from p norms to all monotone symmetric norms.

  2. 2.

    We obtain a simpler construction, which leads to a much smaller approximation ratio. Using a result from [11], to simultaneously approximate all monotone symmetric norms, it suffices to approximate all top-k norms: the top-k norm of a non-negative vector is the sum of its largest k coordinates. Though being more general mathematically, the top-k norms are more convenient to deal with compared to p norms.

  3. 3.

    We can make our algorithm run in nearly linear time. This is the first nearly-linear time simultaneous O(1)-approximation algorithm for the problem, even when we restrict to p norms. In contrast, the algorithm of [21] runs in nearly linear time only when the graph G has O(1) maximum degree.

  4. 4.

    We can make our algorithm run in the MPC model with O(1) rounds. Our work is the first to consider the problem in the MPC model.

1.2 Overview of Techniques

We then discuss our techniques for each of our main results.

Polynomial Time Construction of Simultaneous 𝑶(𝟏)-Approximation for All Symmetric Norms.

By [11], we can reduce the problem of approximating all monotone symmetric norms to approximating all top-k norms. We then construct a fractional solution x, which is a metric over V with range [0,1], such that the fractional disagreement vector for x has top-k norm at most 12.66optk for any k[n], where optk is the cost of the optimum clustering under the top-k norm. Then, we can use the 5-approximate rounding algorithm of KMZ [26], to obtain a simultaneous 63.3-approximation for all top-k norms. The KMZ rounding algorithm has two crucial properties that we need: it does not depend on k and it achieves a per-vertex guarantee.

We elaborate more on how to construct the metric x:(V2)[0,1]. A natural idea to assign the LP values, that was used by [21], is to set xuv based on the intersection of the neighborhood between u and v. Intuitively, the more common neighbors two nodes share, the closer they should be. A straightforward approach to implementing this idea is to set xuv=1|N(u)N(v)|max(d(u),d(v)), where N(u) denotes the neighboring nodes of u in G and d(u)=|N(u)| denotes the degree of u; it is convenient to assume uN(u). This approach works for the top-1 norm (i.e., the norm) as discussed in [20], but fails for the top-n norm (i.e., the 1 norm). Consider a star graph, where the optimal clustering under the top-n norm has a cost of n2. This approach will assign xuv=112=1/2 for all edges, leading to an LP cost of Θ(n2) and a gap of Ω(n). [21] addressed the issue by rounding up LP values to 1 for edge, if for a given node, its total edges LP disagreement is larger than the number of its +edges. After the transformation, the triangle inequalities are only satisfied approximately, but this can be handled with O(1) loss in the approximation ratio.

We address this issue using a different approach, that is inspired by the pre-clustering technique in [17]. We first preprocess the graph G by removing edges uvE for which |N(u)N(v)| is small compared to max{d(u),d(v)}. Let the resulting graph be H. We then set our LP values as xuv=1|NH(u)NH(v)|max{d(u),d(v)} if uv, where NH(u) is the set of neighbors of u in H. We show that this solution is a 12.66-approximation for all top-k norms simultaneously. When compared to [21], in addition to the improved approximation ratio, we obtain a considerably simpler analysis.

Implementation of Algorithm in Nearly-Linear Time and in MPC Model.

We then proceed to discuss our techniques to improve the running time of the algorithm to nearly-linear. The algorithm contains two parts: the construction of the fractional solution x and the rounding procedure. We discuss the two procedures separately.

Constructing x in nearly linear time poses several challenges. First, the construction of the subgraph H requires us to identify edges uvE with small |N(u)N(v)|. Second, we can not explicitly assign x values to all edges. Finally, to compute xuv, we need to compute |NH(u)NH(v)|.

The first and third challenges can be addressed through sampling, with an O(logn) factor loss in the running time. To avoid considering too many edges, we only consider edges with length at most 1ϵ. Consequently, we only need to consider edges whose other endpoints share at least an ϵ fraction of neighbors with u. Given that each neighbor of u in H has degree similar to u, we demonstrate that there will be at most O(d(u)/ϵ) edges to consider for each node u. Overall, there will be O~(mpoly(1/ϵ)) edges for which we need to explicitly assign x values. Moreover, the nearly-linear time algorithm for the construction of x can be naturally implemented in the MPC model, with O(1) number of rounds.

Then we proceed to the rounding algorithm for x. We are explicitly given the x values for +edges, and for nearly-linear number of edges. For other edges, their x values are 1. The KMZ algorithm works as follows: in each round, the algorithm selects a node u as the cluster center and then includes a ball with some radius, meaning the algorithm includes all nodes v such that xuvradius into the cluster, removes the clustered nodes, and repeats the process on the remaining nodes. The rounding algorithm can be easily implemented in nearly-linear time using a priority queue structure. This leads to a nearly-linear time simultaneous O(1)-approximation for correlation clustering for all monotone symmetric norms.

The challenge to implement the algorithm in MPC model is the sequential nature of the algorithm. [26] observes that in each round, if we select the nodes that maximize L(u)=xuvr(rxuv) as cluster center, we can effectively bound each node’s algorithmic cost, where r is the final ratio. However, choosing a node that maximizes some target inherently makes the process sequential. Our key observation is that, instead of selecting the node that maximizes L(u), we can allow some approximation. This strategy still permits achieving a reasonable approximate ratio with an additional 1+ϵ overhead while allowing the selection of multiple nodes as cluster centers, thereby parallelizing the rounding process. In each round, there might be several candidate cluster centers with conflicts. To resolve these conflicts, we employ the classical Luby’s algorithm [27, 16] to find a maximal independent set, ensuring that none of the cluster centers have conflicts.

Organization.

We give some preliminary remarks in Section 2. In Section 3, we describe our simultaneous O(1)-approximation algorithm for correlation clustering for all top-k norms. The reduction from any monotone symmetric norm to top-k norms is deferred to Appendix A. Combining the results leads to a simultaneous O(1)-approximation algorithm for all monotone symmetric norms. Then in Appendix B and C, we show how we can run the algorithm in the MPC model with nearly linear work. In particular, Appendix B and C discuss how to solve the LP and round the LP solution in the MPC model, respectively. The constant round MPC algorithm is described in Appendix D. Theorem 2, 3, 4 and 5 are proved in Section 3, Appendix C, C and D respectively.

2 Preliminaries

The input to correlation clustering is a complete graph whose edges are partitioned into +edges and edges. We shall use the graph G=(V,E) of +edges to denote an instance. Let n=|V| and m=|E|. For simplicity, we assume E contains all the n self-loops uu,uV. So, E is the set of +edges, and (V2)E is the set of edges. The graph G is fixed in most part of the paper.

For any graph H=(VH,EH), and any vertex vVH, let NH(u)={vVHuvEH}. For any vertex uVH and any subset SVH, we define dH(u,S)=vS𝟙(uvEH) as the number of edges between u and S. We simply use dH(u) for dH(u,VH). When the graph H is the input graph G, we omit the subscript. So we use N(u) for NG(u) and d(u) for dG(u). Notice that uN(u) and d(u)=|N(u)|1 for every uU. For the input graph G=(V,E) and any two vertex u,vV, we define Muv=max{d(u),d(v)} as the maximum degree of u and v for simplicity, as this notion will be frequently used. For any two sets X and Y, we denote their symmetric difference by XΔY. Algorithms are parameterized by constants β(0<β<1),λ(0<λ<1) that will be determined later.

A norm on n-dimensional non-negative vectors is a function f:0n0 satisfying f(αx)=αf(x) for every real α0 and x0n, and f(x+y)f(x)+f(y) for every x,y0n. We say a norm f:0n0 is monotone if for every x,y0n with xy, we have f(x)f(y). We say f is symmetric if f(x)=f(x) for every x,x0n such that x is a permutation of x. We say f is the top-k norm for an integer k[n] if f(x) is equal to the sum of the k largest coordinates of x. Chakrabarty and Swamy [11] showed that any monotone and symmetric norm can be written as the maximum of many ordered norms. This leads to the following lemma which reduces the monotone-symmetric norms to top-k norms. For completeness, we defer its proof to Appendix A.

Lemma 6.

For any integer k[n], if an algorithm returns a single clustering 𝒞ALG that is simultaneously a ρ-approximation for all top-k norm objectives, then 𝒞ALG is a ρ-approximation for any monotone and symmetric norm f:0n+.

For a fixed clustering 𝒞, we already defined the disagreement vector of 𝒞 as cost𝒞0n, with cost𝒞(u) for every uV being the number of edges incident to u that are in disagreement w.r.t 𝒞. Given an integer k, and a clustering 𝒞, we denote the top-k value by cost𝒞k=maxTV,|T|=kuTcost𝒞(u). Similarly, for any fractional vector (xuv)u,vV, we denote costx(u)=uvExuv+uv(V2)E(1xuv) as the disagreement for u with respect to x. The top-k value of x is defined as costxk=maxTV,|T|=kuTcostx(u).

We will use the following theorem from [26]:

Theorem 7.

Let G=(V,E) be a correlation clustering instance, and x[0,1](V2) be a metric over V with range [0,1]. There is a polynomial time algorithm that, given G and x, outputs a clustering 𝒞 of V such that cost𝒞(u)5costx(u) for every uV.

We will use the following well-known concentration inequalities.

Theorem 8 (Chernoff Bound).

Let X1,X2,,Xk be independent random variables taking values in {0,1}. Let X=iXi be the sum of these k random variables. Then the following inequalities hold:

  1. (a)

    For any ϵ(0,1), if E[X]U, then Pr[X(1+ϵ)U]exp(ϵ2U/3).

  2. (b)

    For any ϵ(0,1), if E[X]U, then Pr[X(1ϵ)U]exp(ϵ2U/2).

3 Simultaneous 𝑶(𝟏)-Approximation for Top-𝒌 Norms

In this section, we describe our simultaneous 63.3-approximation for correlation clustering for all top-k norms. The algorithm described in this section runs in polynomial time. It first constructs an LP solution to the top-k linear program using a combinatorial procedure. Crucially, the construction does not depend on the value of k. We show that the solution has cost 12.66 times the optimum cost under the top-k norm, for any integer k1. Then we use the rounding algorithm of [26] to round the LP solution to an integral one. As it gives a vertex-by-vertex 5-approximation guarantee, this leads to a 63.3-approximation for the top-k norm for any k.

The LP for minimizing the top-k norm of the clustering is given in LP (1).

min costxk (1a)
s.t. xuv+xuwxvw, u,v,wV, (1b)
xuv[0,1], u,vV, (1c)
xuu=0, uV. (1d)

In the correspondent integer program, xuv for every u,vV indicates if uv is separated or not. We view uv as an unordered pair and thus xuv and xvu are the same variable. So (xuv)u,vV is a metric with distances in {0,1}, which is relaxed to [0,1] in the linear program. This is captured by constraints (1b), (1c) and (1d). Notice that costx(u) for any uV is a linear function of the x variables. The top-k norm of the fractional clustering is defined by costxk=maxSV:|S|=kuScostx(u). This could be captured by introducing a variable z and constraints zuScostx(u) for any SV of size k, and setting z to be the objective to minimize. For simplicity, we use the form as described. Despite having an exponential number of constraints, the LP can be solved efficiently as there is a simple separation oracle. Moreover, we use a combinatorial algorithm to construct a solution x, and thus the algorithm does not solve the LP.

3.1 Algorithm

The algorithm for constructing the LP solution x is given in Algorithm 1. It depends on the parameter β(0,1), whose value will be specified later. During the process, we construct a subgraph H by removing any edge uvE where u and v have significantly different neighbors. We then set xuv as 1|NH(u)NH(v)|Muv if uv and xuv=0 otherwise. Recall that Muv=max{d(u),d(v)} is the maximum degree for any nodes u and v in graph G. Intuitively, we treat +edges as indicators of whether two nodes belong to the same cluster. The first step is to remove edges that should not be in the same cluster. The second step ensures that the more common neighbors two nodes have, the closer their distance should be.

Algorithm 1 Construction of norm-oblivious solution x to metric LP.

In the remaining part of this section, we will show

Lemma 9.

Let k be any integer in [n]. Algorithm 1 outputs a feasible solution (xuv)u,vV for  (1) such that for any k, we have

costxk12.66optk,

where optk is the cost of the optimum solution under the top-k norm.

Proof of Theorem 2.

Once we obtain Lemma 9, [26] provides a rounding algorithm for any feasible solution. Assume the final clustering is 𝒞KMZ. [26] ensures that for any node u, we have cost𝒞KMZ(u)5costx(u), meaning the disagreement for 𝒞KMZ is at most 5 times the disagreement in the LP solution. We can apply the KMZ rounding algorithm to xuv as output by Lemma 9. For any integer k[n], let U be the set of k vertices with the largest disagreement values with respect to 𝒞KMZ. Then we have:

cost𝒞KMZk5uUcostx(u)5costxk512.66optk=63.3optk.

By Lemma 6, we know that 𝒞KMZ is a simultaneous 63.3-approximate clustering for all monotone symmetric norms.

We will first show that our x is feasible in Section 3.2, then we will bound the approximate ratio in Section 3.3.

3.2 The validity of 𝒙 to LP (1)

To show that x is a valid solution to LP (1), it suffices to prove that it is a metric over V with range [0,1]. Moreover, (1c) and (1d) hold trivially. Therefore, it remains to show that the triangle inequality (i.e., constraint (1b)) is satisfied:

Lemma 10.

For any u,v,wV, we have xuv+xuwxvw.

Proof.

We can assume u,v,w are distinct since otherwise the inequality holds trivially. We assume that d(v)d(w) wlog.

xuv+xuwxvw
= (1|NH(u)NH(v)|Muv)+(1|NH(u)NH(w)|Muw)(1|NH(v)NH(w)|d(v))
= 1+|NH(v)NH(w)|d(v)|NH(u)NH(v)|Muv|NH(u)NH(w)|Muw
1+|NH(v)NH(w)|Muv|NH(u)NH(v)|Muv|NH(u)NH(w)|Muw
1|NH(u)NH(w)|Muv|NH(u)NH(w)|Muw
1|NH(u)NH(w)|+|NH(u)NH(w)|d(u)=1|NH(u)|d(u)0.

The first inequality used that d(v)Muv, the second one follows from |NH(u)NH(v)||NH(v)NH(w)||(NH(u)NH(v))(NH(v)NH(w))|=|(NH(v)(NH(u)NH(w))||NH(u)NH(w)|, and the third one used d(u)Muv and d(u)Muw.

3.3 Bounding the Top-𝒌 Norm Cost of 𝒙

In this section, we compare the top-k norm cost of x to optk.

Notations.

We fix the integer k[n]. Let 𝒞 be the clustering that minimizes the top-k norm of disagreement vector, but our analysis works for any clustering. For every vV, let C(v) be the cluster in 𝒞 that contains v.

For every uV, let cost𝒞+(u),cost𝒞(u) and cost𝒞(u) respectively be the number of +edges, edges and edges incident to u that are in disagreement in the clustering 𝒞. Recall cost𝒞k=maxSV:|S|=kuScost𝒞(u) is the top-k norm cost of the clustering 𝒞, thus we have optk=cost𝒞k.

Let U be the set of k vertices u with the largest costx(u) values. So, costxk=uUcostx(u). In order to provide a clear demonstration, we divide all of the edges in G into five parts. First, we separate out the parts that are easily constrained by cost𝒞k. Let φ1+ be the set of +edges that are cut in 𝒞, and φ1 be the set of edges that are not cut in 𝒞. For the remaining +edges in E that are not cut in 𝒞, it is necessary to utilize the properties of +edges in EH. To this end, let φ2+ be the set of +edges in EEH that are not cut in 𝒞, and φ3+ be the set of +edges in EH that are not cut in 𝒞. Finally, we define φ2 as the set of edges that are cut in 𝒞. Formally, we set

φ1+:={uvuvE,C(u)C(v)},φ2+:={uvuvEEH,C(u)=C(v)},
φ3+:={uvuvEH,C(u)=C(v)},φ1:={uvuv(V2)E,C(u)=C(v)},
and φ2:={uvuv(V2)E,C(u)C(v)}.

For every (i,j){(1,+),(2,+),(3,+),(1,),(2,)} and uV, we let φij(u) be the set of pairs in φij incident to u. Notice that φ3+ contains all the self-loops. We use ϕij(u)={v:uvφij(u)} to denote the end-vertices of the edges in φij(u) other than u; so |ϕij(u)|=|φij(u)|. We let yij(u) denote the cost of edges in φij(u) in the solution x. For every (i,j){(1,+),(2,+),(3,+),(1,),(2,)}, we define fij=uUyij(u). Therefore, the top-k norm cost of x is f1++f2++f3++f1+f2.

With the notations defined, we can proceed to the analysis. Prior to this, several propositions are presented, which will prove useful in the following analysis. We start with the property of edges in EEH.

Lemma 11.

For every uvφ2+, we have cost𝒞(u)+cost𝒞(v)βMuv.

Proof.

Since C(u)=C(v), there are at least |N(u)ΔN(v)| disagreements incident to u and v. For every uvφ2+, we have cost𝒞(u)+cost𝒞(v)|N(u)ΔN(v)|βMuv.

Then, we show that edges in EH have similar degrees.

Lemma 12.

For every uvEH, we have (1β)d(v)d(u)11βd(v).

Proof.

For every uvEH, we have |N(u)ΔN(v)|βMuv. Therefore,

min{d(u),d(v)} |N(u)N(v)|=|N(u)N(v)||N(u)ΔN(v)|
(1β)max{d(u),d(v)},

which gives us (1β)d(v)d(u)11βd(v) since β(0,1).

As we are about to analyze the top-k norm objective, we can bound the cost in the solution x using coefficients of cost𝒞. This key observation can be formally demonstrated as follows:

Claim 13.

Let c0n be a vector and α>0 satisfying that |c|α and |c|1αk. Then we have

rVc(r)cost𝒞(r)αcost𝒞k.

Proof.

We have |c/α|1k and |c/α|1. It is well known that c/α is a convex combination of {0,1}-vectors, each of which has at most k 1’s. For each {0,1}-vector d in the combination, we have rVd(r)cost𝒞(r)cost𝒞k. Taking the convex combination of these inequalities gives rVc(r)αcost𝒞(r)cost𝒞k, which is equivalent to the inequality in the claim.

From the definitions of f1+ and f1, we can see that it can be constrained by cost𝒞k.

Claim 14.

f1++f1cost𝒞k.

Proof.

f1++f1 =uU(y1+(u)+y1(u))=uU,uvφ1+φ1xuv
uU(|φ1+(u)|+|φ1(u)|)uUcost𝒞(u)cost𝒞k.

To bound the cost of the remaining +edges, we separately analyze the cost coefficients of vertices in f2+ and f3+.

Lemma 15.

There exists a vector c2+0n with the following properties:

  1. (a)

    f2+rVc2+(r)cost𝒞(r).

  2. (b)

    c2+(r)2β|φ2+(r)|d(r), for every rV.

  3. (c)

    |c2+|12βuU|φ2+(u)|d(u).

Proof.

We bound f2+ as follows:

f2+uU,uvφ2+1uU,uvφ2+cost𝒞(u)+cost𝒞(v)βMuv=:rVc2+(r)cost𝒞(r).

The second inequality used Lemma 11. Therefore, Lemma 15(a) holds.

To show Lemma 15(b), we bound the coefficients for cost𝒞(u) and cost𝒞(v) respectively. If uU, the coefficient for cost𝒞(u) is uvφ2+1βMuv|φ2+(u)|βd(u); if uU, the coefficient is 0. The coefficient for cost𝒞(v) is uUϕ2+(v)1βMuv|φ2+(v)|βd(v). Therefore, c2+(r)2|φ2+(r)|βd(r).

To bound |c2+|1, we can replace cost𝒞(u) and cost𝒞(v) with 1.

Then |c2+|1=uU,uvφ2+2βMuv2βuU|φ2+(u)|d(u). This proves Lemma 15(c).

Following the analysis of the cost coefficients c2+ of f2+, we analyze the cost coefficients of f3+ in edge set EH.

Lemma 16.

There exists a vector c3+0n with the following properties:

  1. (a)

    f3+rVc3+(r)cost𝒞(r).

  2. (b)

    c3+(r)2(|φ2+(r)||φ3+(r)|βd2(r)+|φ2+(r)|βd(r)+|φ3+(r)|d(r)), for every rV.

  3. (c)

    |c3+|12uU(|φ2+(u)||φ3+(u)|βd2(u)+|φ3+(u)|βd(u)+|φ3+(u)|d(u)).

Proof.

Fix some uvφ3+EH with uv. We let u~=argmaxw{u,v}d(w), and v~ be the other vertex in {u,v}. Notice that d(u~)=Muv. Then we have

xuv =1|NH(u)NH(v)|d(u~) (2)
=d(u~)|NH(u)NH(v)|d(u~)=|φ1+(u~)|+|φ2+(u~)|+|φ3+(u~)||NH(u~)NH(v~)|d(u~)
cost𝒞+(u~)+|φ2+(u~)|+|φ2+(v~)|+cost𝒞(v~)d(u~)|φ2+(u)|+|φ2+(v)|+cost𝒞(u)+cost𝒞(v)Muv. (3)

We prove the first inequality in sequence (3). Notice NH(u~)ϕ3+(u~). Therefore,

|φ3+(u~)||NH(u~)NH(v~)| |ϕ3+(u~)||ϕ3+(u~)NH(v~)|
=|ϕ3+(u~)NH(v~)||ϕ2+(v~)ϕ1(v~)|
=|φ2+(v~)φ1(v~)||φ2+(v~)|+cost𝒞(v~).

Above, we used that ϕ3+(u~)NH(v~)C(u~)NH(v~)=C(v~)NH(v~)ϕ2+(v~)ϕ1(v~). Also (3) holds trivially when u=v; this holds for every uvφ3+.

Therefore,

f3+ =uUy3+(u)uU,uvφ3+|φ2+(u)|+|φ2+(v)|+cost𝒞(u)+cost𝒞(v)Muv
uU,uvφ3+,uwφ2+1Muv+uU,uvφ3+,vwφ2+1Muv+uU,uvφ3+cost𝒞(u)+cost𝒞(v)Muv
uU,uvφ3+,uwφ2+cost𝒞(u)+cost𝒞(w)βMuvMuw+uU,uvφ3+,vwφ2+cost𝒞(v)+cost𝒞(w)βMuvMvw
+uU,uvφ3+cost𝒞(u)+cost𝒞(v)Muv=:rVc3+(r)cost𝒞(r).

Again, we used Lemma 11 twice to prove the last inequality. Therefore, Lemma 16(a) holds.

To prove Lemma 16(b), we consider the coefficients for cost𝒞(u),cost𝒞(v) and cost𝒞(w) respectively. For a uU, the coefficient for cost𝒞(u) is

uvφ3+,uwφ2+1βMuvMuw+uvφ3+1Muv|φ3+(u)||φ2+(u)|βd2(u)+|φ3+(u)|d(u).

The coefficient for cost𝒞(v) is

uUϕ3+(v),vwφ2+1βMuvMvw+uUϕ3+(v)1Muv|φ3+(v)||φ2+(v)|βd2(v)+|φ3+(v)|d(v).

The coefficient for cost𝒞(w) is at most

uUϕ2+(w),uvφ3+1βMuvMuw+vwφ2+,uUϕ3+(v)1βMuvMvw
uUϕ2+(w)d(u)βd(u)Muw+vwφ2+d(v)βd(v)Mvw
=uUϕ2+(w)1βMuw+vwφ2+1βMvw|φ2+(w)|βd(w)+|φ2+(w)|βd(w)=2|φ2+(w)|βd(w).

Therefore, for every rV, we have c3+(r)2(|φ2+(r)||φ3+(r)|βd2(r)+|φ2+(r)|βd(r)+|φ3+(r)|d(r)).

We then bound |c3+|1 as follows:

|c3+|1 =uU,uvφ3+,uwφ2+2βMuvMuw+uU,uvφ3+,vwφ2+2βMuvMvw+uU,uvφ3+2Muv
uU(2|φ2+(u)||φ3+(u)|βd(u)d(u)+2|φ3+(u)|βd(u)+2|φ3+(u)|d(u))
=2uU(|φ2+(u)||φ3+(u)|βd2(u)+|φ3+(u)|βd(u)+|φ3+(u)|d(u)).

This finishes the proof of Lemma 16(c) and thus Lemma 16.

With Lemma 15 and Lemma 16, we can then bound f2++f3+:

Lemma 17.

f2++f3+4βcost𝒞k.

Proof.

We define c(r)=c2+(r)+c3+(r) for every rV. Then,

c(r) 2|φ2+(r)|βd(r)+2(|φ2+(r)||φ3+(r)|βd2(r)+|φ2+(r)|βd(r)+|φ3+(r)|d(r))
=2(|φ2+(r)||φ3+(r)|βd2(r)+|φ3+(r)|d(r)+2|φ2+(r)|βd(r))
2(|φ3+(r)|βd(r)+|φ3+(r)|βd(r)+2|φ2+(r)|βd(r))
4β,

where the first inequality holds because Lemma 15(b) and Lemma 16(b), the second inequality holds because |φ2+(r)|d(r) and β<1, the last inequality holds because |φ2+(r)|+|φ3+(r)|d(r).

Similarly, we have

|c|1 2βuU|φ2+(u)|d(u)+2uU(|φ2+(u)||φ3+(u)|βd2(u)+|φ3+(u)|βd(u)+|φ3+(u)|d(u))
=2uU(|φ2+(u)||φ3+(u)|βd2(u)+|φ3+(u)|βd(u)+|φ3+(u)|d(u)+|φ2+(u)|βd(u))
2uU(|φ2+(u)|βd(u)+|φ3+(u)|βd(u)+|φ3+(u)|βd(u)+|φ2+(u)|βd(u))
4βk,

where the first inequality holds because Lemma 15(c) and Lemma 16(c), the second inequality holds because β<1 and for any uU there is |φ3+(u)|d(u), the last inequality holds because for any uU there is |φ2+(u)|+|φ3+(u)|d(u).

The lemma follows from Claim 13.

For the cost of the remaining edges, i.e. f2, we also analyze the cost coefficients of each vertex.

Lemma 18.

There exists a vector c20n with the following properties:

  1. (a)

    f2rVc2(r)cost𝒞(r).

  2. (b)

    c2(r)21β, for every rV.

  3. (c)

    |c2|12k1β.

Proof.

We have

f2 =uU,uvφ2(1xuv)=uU,uvφ2|NH(u)NH(v)|MuvuU,uvφ2wNH(u)NH(v)1Muv.

Given that uvφ2 indicates that C(u)C(v), we distinguish between two cases: C(w)=C(u) and C(w)C(u). For simplicity, in the summations below, we automatically impose the constraints uU,uvφ2 and uw,vwEH when the vertices involved in constraints are defined. Notice that we have d(v)(1β)d(w) from Lemma 12.

u,v,w:C(w)=C(u)1Muv u,w:C(w)=C(u)cost𝒞+(w)max(d(u),(1β)d(w)). (4)
u,v,w:C(w)C(u)1Muv u,w:C(w)C(u)d(w)d(u)u,w:C(w)C(u)11βucost𝒞+(u)1β. (5)

Adding (4) and (5), and using that cost𝒞+(u)cost𝒞(u) for every uV, we get

f2uU,wNH(u):C(w)=C(u)cost𝒞(w)max(d(u),(1β)d(w))+uUcost𝒞(u)1β=:rVc2(r)cost𝒞(r).

Therefore, Lemma 18(a) holds.

To prove Lemma 18(b), we consider the coefficients for cost𝒞(u) and cost𝒞(w) respectively. The coefficient for cost𝒞(u) is 11β. The coefficient for cost𝒞(w) is

uU,wNH(u):C(w)=C(u)1max(d(u),(1β)d(w)) uU,wNH(u):C(w)=C(u)1(1β)d(w)11β.

Therefore, for any rV, we have c2(r)21β.

By replacing cost𝒞(u) and cost𝒞(w) with 1, we then finished the proof of Lemma 18(c) as follows:

|c2|1 uU,wNH(u):C(w)=C(u)1max(d(u),(1β)d(w))+uU11β
uU,wNH(u):C(w)=C(u)1d(u)+uU11β
uU|NH(u)|d(u)+uU11βuU1+uU11βuU21β=2k1β.

With Lemma 18, we can then bound f2.

Lemma 19.

f221βcost𝒞k.

Proof.

By Claim 13 and Lemma 18, we have f2rVc2(r)cost𝒞(r)21βcost𝒞k.

So the final ratio for Algorithm 1 is 1+4β+21β. Let β=0.5858, then the ratio is at most 12.66. This finishes the proof of Lemma 9.

References

  • [1] Rakesh Agrawal, Alan Halverson, Krishnaram Kenthapadi, Nina Mishra, and Panayiotis Tsaparas. Generating labels from clicks. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 172–181, 2009. doi:10.1145/1498759.1498824.
  • [2] Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. Journal of the ACM, 55(5):1–27, 2008. doi:10.1145/1411509.1411513.
  • [3] Sepehr Assadi and Chen Wang. Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions. In Proceedings of the 13th Conference on Innovations in Theoretical Computer Science (ITCS), volume 215 of LIPIcs, pages 10:1–10:20, 2022. doi:10.4230/LIPICS.ITCS.2022.10.
  • [4] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine learning, 56(1):89–113, 2004. doi:10.1023/B:MACH.0000033116.57574.95.
  • [5] Guy E Blelloch, Jeremy T Fineman, and Julian Shun. Greedy sequential maximal independent set and matching are parallel on average. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pages 308–317, 2012. doi:10.1145/2312005.2312058.
  • [6] Mélanie Cambus, Davin Choo, Havu Miikonen, and Jara Uitto. Massively parallel correlation clustering in bounded arboricity graphs. In 35th International Symposium on Distributed Computing (DISC), volume 209 of LIPIcs, pages 15:1–15:18, 2021. doi:10.4230/LIPICS.DISC.2021.15.
  • [7] Mélanie Cambus, Fabian Kuhn, Etna Lindy, Shreyas Pai, and Jara Uitto. A (3+ε)-approximate correlation clustering algorithm in dynamic streams. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2861–2880. SIAM, 2024. doi:10.1137/1.9781611977912.101.
  • [8] Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, Alantha Newman, and Lukas Vogl. Understanding the cluster linear program for correlation clustering. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 1605–1616, 2024. doi:10.1145/3618260.3649749.
  • [9] Nairen Cao, Shang-En Huang, and Hsin-Hao Su. Breaking 3-factor approximation for correlation clustering in polylogarithmic rounds. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 4124–4154. SIAM, 2024. doi:10.1137/1.9781611977912.143.
  • [10] Deepayan Chakrabarti, Ravi Kumar, and Kunal Punera. A graph-theoretic approach to webpage segmentation. In Proceedings of the 17th International conference on World Wide Web (WWW), pages 377–386, 2008. doi:10.1145/1367497.1367549.
  • [11] Deeparnab Chakrabarty and Chaitanya Swamy. Approximation algorithms for minimum norm and ordered optimization problems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 126–137, 2019. doi:10.1145/3313276.3316322.
  • [12] Moses Charikar, Neha Gupta, and Roy Schwartz. Local guarantees in graph cuts and clustering. In International Conference on Integer Programming and Combinatorial Optimization (IPCO), pages 136–147, 2017. doi:10.1007/978-3-319-59250-3_12.
  • [13] Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360–383, 2005. doi:10.1016/J.JCSS.2004.10.012.
  • [14] Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC), pages 219–228, 2015. doi:10.1145/2746539.2746604.
  • [15] Yudong Chen, Sujay Sanghavi, and Huan Xu. Clustering sparse graphs. In Advances in Neural Information Processing Systems (Neurips), pages 2204–2212, 2012.
  • [16] Flavio Chierichetti, Nilesh Dalvi, and Ravi Kumar. Correlation clustering in MapReduce. In Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 641–650, 2014. doi:10.1145/2623330.2623743.
  • [17] Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, and Jakub Tarnawski. Correlation clustering in constant many parallel rounds. In International Conference on Machine Learning, pages 2069–2078. PMLR, 2021. URL: http://proceedings.mlr.press/v139/cohen-addad21b.html.
  • [18] Vincent Cohen-Addad, Euiwoong Lee, Shi Li, and Alantha Newman. Handling correlated rounding error via preclustering: A 1.73-approximation for correlation clustering. In Proceedings of the 64rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2023.
  • [19] Vincent Cohen-Addad, Euiwoong Lee, and Alantha Newman. Correlation clustering with Sherali-Adams. In Proceedings of 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 651–661, 2022. doi:10.1109/FOCS54457.2022.00068.
  • [20] Sami Davies, Benjamin Moseley, and Heather Newman. Fast combinatorial algorithms for min max correlation clustering. arXiv preprint arXiv:2301.13079, 2023. doi:10.48550/arXiv.2301.13079.
  • [21] Sami Davies, Benjamin Moseley, and Heather Newman. Simultaneously approximating all lp-norms in correlation clustering. In 51st International Colloquium on Automata, Languages, and Programming, ICALP 2024, 2024.
  • [22] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008. doi:10.1145/1327452.1327492.
  • [23] Manuela Fischer and Andreas Noever. Tight analysis of parallel randomized greedy mis. ACM Transactions on Algorithms (TALG), 16(1):1–13, 2019. doi:10.1145/3326165.
  • [24] Holger SG Heidrich, Jannik Irmai, and Bjoern Andres. A 4-approximation algorithm for min max correlation clustering. In AISTATS, pages 1945–1953, 2024.
  • [25] Dmitri V. Kalashnikov, Zhaoqi Chen, Sharad Mehrotra, and Rabia Nuray-Turan. Web people search via connection analysis. IEEE Transactions on Knowledge and Data Engineering, 20(11):1550–1565, 2008. doi:10.1109/TKDE.2008.78.
  • [26] Sanchit Kalhan, Konstantin Makarychev, and Timothy Zhou. Correlation clustering with local objectives. Advances in Neural Information Processing Systems, 32, 2019.
  • [27] Michael Luby. A simple parallel algorithm for the maximal independent set problem. In Proceedings of the seventeenth annual ACM symposium on Theory of computing, pages 1–10, 1985. doi:10.1145/22145.22146.
  • [28] Xinghao Pan, Dimitris S. Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. Parallel correlation clustering on big graphs. In Advances in Neural Information Processing Systems (Neurips), pages 82–90, 2015. URL: https://proceedings.neurips.cc/paper/2015/hash/b53b3a3d6ab90ce0268229151c9bde11-Abstract.html.
  • [29] Gregory Puleo and Olgica Milenkovic. Correlation clustering and biclustering with locally bounded errors. In International Conference on Machine Learning, pages 869–877. PMLR, 2016. URL: http://proceedings.mlr.press/v48/puleo16.html.
  • [30] Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub Łkacki, and Vahab Mirrokni. Scalable community detection via parallel correlation clustering. arXiv preprint arXiv:2108.01731, 2021.
  • [31] Nate Veldt, David F. Gleich, and Anthony Wirth. A correlation clustering framework for community detection. In Proceedings of the 2018 ACM World Wide Web Conference (WWW), pages 439–448, 2018. doi:10.1145/3178876.3186110.

Appendix A Reduction from All Monotone Symmetric Norms to Top-𝒌 Norms

Definition 20 (Ordered Norms).

For any vector x0n, let x denote the vector x with its coordinates sorted in non-increasing order. Given weight vector w0n with w1w2wn, the w-ordered norm of x is defined as order(w;x)=i=1nwixi.

Lemma 21 (Lemma 5.2 of [11]).

For any monotone and symmetric norm f:n+, define the set 𝔹+(f):={x+n:f(x)1}, and W={w+n:w1w2wn,wisasubgradientoffatsomex𝔹+(f)}. Then we have f(x)=maxwWorder(w;x) for every x0n.

See 6

Proof of Lemma 6.

For any w=(w1,w2,,wn) such that w1w2wn0, if we set w=(w1,w2,,wn) as

wi={wiwi+1i[1,n1]wni=n

Let topk(x) denote the top-k norm of x. Then we have order(w;x)=k=1nwktopk(x).

Let 𝔹+(f):={x+n:f(x)1} and W={w+n:w1w2wn,wisasubgradientoffatsomex𝔹+(f)}. We construct a new set W={w|wW:w=(w1=w1w2,w2=w2w3,,wn1=wn1wn,wn=wn)}. By Lemma 21, we have f(x)=maxwWorder(w;x)=maxwWk=1nwktopk(x).

Let y be the disagreement vector for the given clustering 𝒞ALG. For any symmetric monotone norm f:0n0, define yf to be the disagreement vector for the optimal clustering under the norm f. By the assumption that 𝒞ALG is a simultaneous ρ-approximation for every top-k norm, we have we have topk(y)ρtopk(ytop-k) for every k[n], where ytop-k is the disagreement vector for the optimal clustering under top-k objective. Now we bound f(y) in terms of f(yf) for any monotone symmetric norm f:

f(y) =maxwWk=1nwktopk(y)maxwWk=1nwkρtopk(ytop-k)
=ρmaxwWk=1nwktopk(ytop-k)ρmaxwWk=1nwktopk(yf)=ρf(yf).

Appendix B Implementing Algorithm 1 in Nearly Linear Time

In this section, we show how to run Algorithm 1 approximately in nearly linear time. Indeed, the algorithm can be implemented in MPC model with O(1) rounds. More precisely, we will show the following theorem:

Theorem 22.

Let ϵ>0 and δ>0 be small enough constants. Given a graph G=(V,E), there exists an MPC algorithm that computes a solution {x~uv}u,vV such that

  1. 1.

    For any integer k[n], we have costx~k(12.66+ϵ)optk.

  2. 2.

    For any u,v,wV, we have x~uv+x~uw+ϵx~vw.

The algorithm succeeds with probability at least 11/n. Moreover, the algorithm runs in O(1) rounds, has a total work of O~(m/ϵ6), requires O(nδ) memory per machine and a O~(m/ϵ6) total memory.

We give the nearly linear time implementation of Algorithm 1 in Algorithm 2. Line 2 constructs the graph H efficiently. Line 3-6 find the set K of edges we want to assign LP value. For any edge uvK, we will simply set its LP value as 1. Last, Line 7 to Line 14 is to set up x~uv satisfying the conditions in Theorem 22. The full version of this paper will contain a complete analysis of Algorithm 2 and proof of Theorem 22.

Algorithm 2 Nearly Efficient Algorithm for Algorithm 1.

Appendix C Rounding

We will present a nearly linear time rounding algorithm. Furthermore, our algorithm only takes O~(1) rounds in the MPC model. The purpose of this section is to show

See 4

We emphasize that even if the LP values satisfy the exact triangle inequality, rather than an approximate triangle inequality, the ϵ terms in the approximate ratio will still be present. These ϵ terms arise from two sources: the approximate inequality itself and the inherent characteristics of our MPC algorithm.

Given Theorem 22 and Theorem 4, we are now able to show the main result of this paper.

Proof of Theorem 3.

We first run Algorithm 2, which outputs x~uv as input to Algorithm 3. Note that by Theorem 22, we know that for any k[1,n], we have

costx~k(12.66+ϵ)optk

where optk is the cost of the optimal correlation clustering solution when using the top-k norm objective. By Theorem 4, we know that the final cluster 𝒞 satisfies

cost𝒞k(5+55ϵ)costx~k(63.3+O(ϵ))optk.

By Lemma 6, we know that 𝒞 is a simultaneous (63.3+O(ϵ))-approximate clustering for all monotone symmetric norms.

For the number of rounds, Algorithm 1 takes O(1) rounds, and Algorithm 3 takes O(log3n/ϵ) rounds, resulting in a total of O(log3n/ϵ) rounds.

Algorithm 1 requires a total memory of O~(m/ϵ6) and a total work of O~(m/ϵ6) in the MPC model. By analyzing Algorithm 2, we know that |K|=O(mlogn/ϵ). Therefore, Algorithm 3 requires a total memory of O~(m/ϵ2) and a total work of O~(m/ϵ) in the MPC model, as established by Theorem 4. In total, it requires a total memory of O~(m/ϵ6) and a total work of O~(m/ϵ6) in the MPC model.

Rounding Algorithm

Assume that we are given an instance graph G=(V,E) and a LP solution (xuv)u,vV such that xuv+xuw+ϵxvw for any u,v,wV3. Given a subgraph Vt and node w, radius r, define the ball centering at w with radius r as BallVt(w,r)={uuVt,xwur}. Define Lt(w)=uBallVt(w,r)(rxuw). Note that Lt(w)r since w itself is in BallVt(w,r).

Algorithm Description.

Our algorithm works as follows: in each round, we choose a set of cluster centers and choose the ball with radius 25 as a cluster. More precisely. At step t, Vt is the set of unclustered vertices. We first compute Ltmax to ensure that for each uVt, we have Lt(u)<(1+ϵ)Ltmax. For any node u, if Lt(u)Ltmax, we will add u to the set of candidate cluster centers Mt (Line 6-10). Then, we compute cluster centers by adding vertices in the Mt set to the St set with probability pt, where the more vertices in Mt are in Ball(radius=25) with each other, the smaller the probability (Line 11-14). After that, to avoid conflicts, we remove some cluster centers in St if they are too close to each other, and derive the final cluster center set Ht (Line 15). Let Ft=BallVt(Ht,25) be the nodes clustered at step t. Then we add each uFt to the cluster from Ht with minimum ID. We will remove the clustered nodes and repeat the above process until all vertices are clustered.

Algorithm 3 The Rounding Algorithm.

Complete analysis of Algorithm 3 and proof of Theorem 4 appear in the full version.

Appendix D A Constant Round MPC Algorithm

We will show Theorem 5 in this section. We repeat for convenience, See 5

Algorithm

Theorem 3 gives us an O(log3n) rounds MPC algorithm. The bottleneck is the rounding procedure. To achieve a constant rounds MPC algorithm, instead of setting up the LP and rounding, we use the pre-clustering algorithm from [17], which is very useful for 1-norm correlation clustering. We show that the pre-clustering algorithm can also provide an O(1)-approximate ratio for all monotone symmetric norms simultaneously.

Algorithm Description.

The algorithm from [17] is parameterized by β and λ. It has three steps:

  1. 1.

    The first step is the same as the first step of Algorithm 1, where we compute the graph H (Line 2).

  2. 2.

    The algorithm marks a node as light if it loses more than a λ fraction of its neighbors in the first step. Otherwise, it marks the node as heavy. The algorithm removes all edges between two light nodes in H (Line 3 - Line 6).

  3. 3.

    The last step is to output the connected components F of the final graph.

Algorithm 4 Pre-clustering – Algorithm 1 in [17].

A detailed analysis of Algorithm 4 along with the formal proof of Theorem 5 is deferred to the full version.