Simultaneously Approximating All Norms for Massively Parallel Correlation Clustering

Cao, Nairen; Li, Shi; Ye, Jia

doi:10.4230/LIPIcs.ICALP.2025.40

Simultaneously Approximating All Norms for Massively Parallel Correlation Clustering

Nairen Cao

Department of Computer Science and Engineering, New York University, NY, USA Shi Li

School of Computer Science, Nanjing University, China Jia Ye

School of Computer Science, Nanjing University, China

Abstract

We revisit the simultaneous approximation model for the correlation clustering problem introduced by Davies, Moseley, and Newman [21]. The objective is to find a clustering that minimizes given norms of the disagreement vector over all vertices.

We present an efficient algorithm that produces a clustering that is simultaneously a $63.3$ -approximation for all monotone symmetric norms. This significantly improves upon the previous approximation ratio of $6348$ due to Davies, Moseley, and Newman [21], which works only for $\ell_{p}$ -norms.

To achieve this result, we first reduce the problem to approximating all top- $k$ norms simultaneously, using the connection between monotone symmetric norms and top- $k$ norms established by Chakrabarty and Swamy [11]. Then we develop a novel procedure that constructs a $12.66$ -approximate fractional clustering for all top- $k$ norms. Our $63.3$ -approximation ratio is obtained by combining this with the $5$ -approximate rounding algorithm by Kalhan, Makarychev, and Zhou [26].

We then demonstrate that with a loss of $\epsilon$ in the approximation ratio, the algorithm can be adapted to run in nearly linear time and in the MPC (massively parallel computation) model with poly-logarithmic number of rounds.

By allowing a further trade-off in the approximation ratio to $(359+\epsilon)$ , the number of MPC rounds can be reduced to a constant.

Keywords and phrases:

Correlation Clustering, All-Norms, Approximation Algorithm, Massively Parallel Algorithm

Category:

Track A: Algorithms, Complexity and Games

Funding:

Nairen Cao: Supported by NSF grant CCF-2008422.

Shi Li: Supported by the State Key Laboratory for Novel Software Technology, and the New Cornerstone Science Laboratory.

Jia Ye: Supported by the State Key Laboratory for Novel Software Technology, and the New Cornerstone Science Laboratory.

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Approximation algorithms analysis ; Theory of computation

\rightarrow

Massively parallel algorithms

Editors:

Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Clustering is a classic problem in unsupervised machine learning. It aims to classify a given set of data elements based on their similarities, with the goal of maximizing the similarity between elements within the same class and minimizing the similarity between elements in different classes. Among the various graph clustering problems, correlation clustering stands out as a classic model. Initially proposed by Bansal, Blum, and Chawla [4], the model has numerous practical applications, including automated labelling [10, 1], community detection and mining [15, 31, 30], and disambiguation task [25], among others.

The input of the standard correlation clustering problem is a complete graph over a set $V$ of $n$ vertices, where edges are partitioned into the set $E^{+}$ of $+$ edges and the set $E^{-}$ of $-$ edges. The output of the problem is a clustering (or a partition) ${\mathcal{C}}$ of $V$ that minimizes the number of edges in disagreement: an edge $uv\in\binom{V}{2}$ is in disagreement if $uv\in E^{+}$ but $u$ and $v$ are in different clusters in ${\mathcal{C}}$ , or $uv\in E^{-}$ but $u$ and $v$ are in a same cluster in ${\mathcal{C}}$ . Throughout the paper, we shall use a graph $G=(V,E)$ to denote a correlation clustering instance, with $E$ being $E^{+}$ and $\binom{V}{2}\setminus E$ being $E^{-}$ .

This problem is known to be APX-Hard [13]. There has been a long stream of $O(1)$ -approximation algorithms for the problem [4, 13, 2, 14, 19, 18, 8], with the current best approximation ratio being $1.437$ [8]. In the same paper, the authors presented an improved hardness of 24/23 for the problem, which also made the constant explicit.

Besides the standard setting, other objectives have been studied recently, with the goal of minimizing some norm of the disagreement vector of the clustering ${\mathcal{C}}$ over vertices. For a clustering $\mathcal{C}$ of $V$ , the disagreement vector of ${\mathcal{C}}$ is defined as $\mathrm{cost}_{{\mathcal{C}}}\in\mathbb{Z}_{\geq 0}^{n}$ , where $\mathrm{cost}_{{\mathcal{C}}}(u)$ for every $u\in V$ is the number of edges incident to $u$ that are in disagreement with respect to $\mathcal{C}$ . Given some norm $f:\mathbb{R}_{\geq 0}^{n}\to\mathbb{R}_{\geq 0}$ ¹¹1This means $f$ satisfies $f(\alpha x)=\alpha f(x)$ for every real $\alpha\geq 0$ and $x\in\mathbb{R}_{\geq 0}^{n}$ , and $f(x+y)\leq f(x)+f(y)$ for every $x,y\in\mathbb{R}_{\geq 0}^{n}$ , the goal of the problem is to minimize $f(\mathrm{cost}_{{\mathcal{C}}})$ . Notice that the standard correlation clustering problem corresponds to the case where $f$ is the $\ell_{1}$ norm.

Puleo and Milenkovic [29] initiated the study of correlation clustering with the goal of minimizing the $\ell_{p}$ norm of the disagreement vector, where $p\in[1,\infty]$ . They proved that the problem is NP-hard for the $\ell_{\infty}$ -norm objective. Given a fixed $p\in[1,\infty]$ , for the $\ell_{p}$ -norms objective, they gave a $48$ -approximation algorithm. The approximation ratio was subsequently improved by Charikar, Gupta, and Schwartz [12] to $7$ for the $\ell_{\infty}$ -norm, and by Kalhan, Makarychev and Zhou [26] to $5$ for the $\ell_{p}$ -norm with any fixed $p\in[1,\infty]$ . Very recently, Heidrich, Irmai, and Andres [24] improved the approximate ratio to 4 for the $\ell_{\infty}$ -norm.

Davies, Moseley and Newman [21] introduced the concept of simultaneous approximation for all $\ell_{p}$ -norms. They developed an efficient algorithm that outputs a single clustering ${\mathcal{C}}$ , which is simultaneously an $O(1)$ -approximation for the $\ell_{p}$ norm for all $p\in[1,\infty]$ . This is rather surprising, as it was not known a priori whether such a clustering ${\mathcal{C}}$ even exists. To achieve the goal, they first construct a fractional clustering $x$ that is simultaneously an $O(1)$ -approximation for all $\ell_{p}$ norms and then use the $5$ -approximate rounding algorithm of Kalhan, Makarychev, and Zhou [26] to round $x$ into an integral clustering ${\mathcal{C}}$ . Crucially, the algorithm of [26] guarantees a per-vertex $5$ -approximation, meaning that $\mathrm{cost}_{\mathcal{C}}(u)$ is at most $5$ times the fractional number of edges in disagreement incident to $u$ , for every $u\in V$ . This strong property is used to obtain the final simultaneous $O(1)$ -approximation in [21].

In light of the growing networks, it is imperative to develop efficient parallel algorithms. This urgency is particularly pronounced in machine learning and data mining applications, where timely and efficient processing is essential for extracting meaningful insights from vast datasets. Many works in the literature aim to design efficient parallel algorithms [5, 16, 28, 23, 6, 17, 3, 7, 9]. The MPC model, as a theoretical abstraction of several real-world parallel models such as MapReduce [22], is a prevalent methodology employed in these works.

1.1 Our results

In this paper, we revisit and generalize the simultaneous approximation model for the correlation clustering that was introduced by [21]. Instead of considering only $\ell_{p}$ norms, we consider all monotone symmetric norms. We say a norm $f:\mathbb{R}_{\geq 0}^{n}\to\mathbb{R}_{\geq 0}$ is monotone if for every $x,y\in\mathbb{R}_{\geq 0}^{n}$ with $x\leq y$ , we have $f(x)\leq f(y)$ . We say $f$ is symmetric if $f(x)=f(x^{\prime})$ for every $x,x^{\prime}\in\mathbb{R}_{\geq 0}^{n}$ such that $x^{\prime}$ is a permutation of $x$ . Such norms were considered in [11] in the context of load balancing and clustering. Our first result is that there exists simultaneous $O(1)$ -approximation for all monotone symmetric norms for correlation clustering and it can be constructed in polynomial time.

Definition 1.

Given a correlation clustering instance $G=(V,E)$ and $\alpha\geq 1$ , we say a clustering ${\mathcal{C}}$ over $V$ is simultaneously $\alpha$ -approximate, or a simultaneous $\alpha$ -approximation, for a family $F$ of norms, if we have $f(\mathrm{cost}_{\mathcal{C}})\leq\alpha\cdot f(\mathrm{cost}_{\mathrm{OPT}_{f% }})$ for every $f\in F$ , where $\mathrm{OPT}_{f}$ is the optimum clustering for $G$ under norm $f$ .

Theorem 2.

Given a correlation clustering instance $G=(V,E)$ , in polynomial time we can construct a simultaneous $63.3$ -approximate clustering ${\mathcal{C}}$ for the family of monotone symmetric norms.

Next, we are concerned with the running time of the algorithm and its implementation under the MPC model. To state the result, we need a formal description of the MPC model.

The MPC model.

In the MPC model, data is distributed across a set of machines, and computation proceeds in synchronous rounds. During each round, each machine first receives messages from other machines, then performs computations based on this information and its own allocated memory, and finally sends messages to other machines to be received at the start of the next round. Each machine has limited local memory, restricting the total number of messages it can receive or send in a round. The efficiency of the algorithm is measured by the number of rounds, the memory used by each machine, the total memory used by all machines, and the running time over all machines, also known as the total work.

In this paper, we consider the MPC model in the strictly sublinear regime: Each machine has $O(n^{\delta})$ local memory, where $n$ is the input size and $\delta>0$ is a constant that can be made arbitrarily small. Under this model, we assume the input received by each machine has size $O(n^{\delta})$ .

We then describe the correlation clustering problem under the MPC model in the strictly sublinear regime. We use $n=|V|$ and $m=|E|$ to denote the number of vertices and edges respectively in the input graph $G=(V,E)$ . The edges $E$ are distributed across the machines, where each machine has $O(n^{\delta})$ memory for a constant $\delta>0$ which can be made arbitrarily small. At the end of the algorithm, each machine needs to store in its local memory the IDs of the clusters for all the vertices incident to its assigned edges.

Our main result regarding MPC algorithm is given as follows,

Theorem 3.

Let $\epsilon\in(0,1)$ . There exists a randomized MPC algorithm in the strictly sublinear regime that, given a correlation clustering instance $G=(V,E)$ , in $O(\log^{3}n)$ rounds outputs a simultaneous $(63.3+O(\epsilon))$ clustering for $G$ for all monotone symmetric norms. This algorithm succeeds with high probability. It uses $\tilde{O}(m/\epsilon^{6})$ total memory and $\tilde{O}(m/\epsilon^{6})$ total work.²²2As usual, we use $\tilde{O}(\cdot)$ to hide a poly-logarithmic factor in the input size.

In particular, the algorithm can be converted into a nearly linear time algorithm that with high probability outputs a $(63.3+O(\epsilon))$ -simultaneous approximation for all monotone symmetric norms.

Along the way, we develop an MPC rounding algorithm with a per-vertex $(5+55\epsilon)$ approximation guarantee, based on the sequential algorithm due to [26]. Given its potential independent interest, we state it here for future references.

Theorem 4.

Let $\epsilon\in(0,1)$ be a constant. Given a graph $G=(V,E)$ and a set of LP values $(x_{uv})_{u,v\in V}$ satisfying the approximate triangle inequality, that is, for any $u,v,w\in V$ , we have $x_{uv}+x_{uw}+\epsilon\geq x_{vw}$ . Let $y_{u}=\sum_{uv\in E}x_{uv}+\sum_{uv\in{V\choose 2}\setminus E}(1-x_{uv})$ be the LP disagreement for node $u$ . There exists an MPC algorithm that computes a clustering ${\mathcal{C}}$ such that for any node $u$ , we have

\displaystyle\mathrm{cost}_{{\mathcal{C}}}(u)\leq(5+55\epsilon)y_{u}.

This algorithm always succeeds but terminates in $O(\log^{3}n/\epsilon)$ rounds with high probability and requires $O(n^{\delta})$ memory per machine. Moreover, let $K=E\cup\{uv\in\binom{V}{2}\setminus E\mid x_{uv}<1\}$ be the set of $+$ edges and $-$ edges whose LP value is less than 1. The algorithm uses a total memory of $O(|K|\log n)$ and a total work of $O(|K|\log^{3}n/\epsilon)$ .

The $O(\log^{3}n)$ round in the above theorem might not be desirable for many applications. Our next result shows that we can reduce the number of rounds to $O(1)$ , albeit with a worse $O(1)$ approximation ratio:

Theorem 5.

Let $\epsilon\in(0,1)$ be a constant. There exists a randomized MPC algorithm in the strictly sublinear regime that, given a correlation clustering instance $G=(V,E)$ , in $\mathbf{O(1)}$ rounds outputs a clustering that is simultaneously a $(359+\epsilon)$ -approximation, for all monotone symmetric norms. This algorithm succeeds with high probability, and uses a total memory of $\tilde{O}(m/\epsilon^{2})$ and a total work of $\tilde{O}(m/\epsilon^{2})$ .

Overall, relative to [21], our algorithms demonstrate the following improvements.

1.

We generalize the family of norms for the simultaneous approximation from $\ell_{p}$ norms to all monotone symmetric norms.
2.

We obtain a simpler construction, which leads to a much smaller approximation ratio. Using a result from [11], to simultaneously approximate all monotone symmetric norms, it suffices to approximate all top- $k$ norms: the top- $k$ norm of a non-negative vector is the sum of its largest $k$ coordinates. Though being more general mathematically, the top- $k$ norms are more convenient to deal with compared to $\ell_{p}$ norms.
3.

We can make our algorithm run in nearly linear time. This is the first nearly-linear time simultaneous $O(1)$ -approximation algorithm for the problem, even when we restrict to $\ell_{p}$ norms. In contrast, the algorithm of [21] runs in nearly linear time only when the graph $G$ has $O(1)$ maximum degree.
4.

We can make our algorithm run in the MPC model with $O(1)$ rounds. Our work is the first to consider the problem in the MPC model.

1.2 Overview of Techniques

We then discuss our techniques for each of our main results.

Polynomial Time Construction of Simultaneous $O(1)$ -Approximation for All Symmetric Norms.

By [11], we can reduce the problem of approximating all monotone symmetric norms to approximating all top- $k$ norms. We then construct a fractional solution $x$ , which is a metric over $V$ with range $[0,1]$ , such that the fractional disagreement vector for $x$ has top- $k$ norm at most $12.66\cdot{\mathrm{opt}}_{k}$ for any $k\in[n]$ , where ${\mathrm{opt}}_{k}$ is the cost of the optimum clustering under the top- $k$ norm. Then, we can use the 5-approximate rounding algorithm of KMZ [26], to obtain a simultaneous $63.3$ -approximation for all top- $k$ norms. The KMZ rounding algorithm has two crucial properties that we need: it does not depend on $k$ and it achieves a per-vertex guarantee.

We elaborate more on how to construct the metric $x:{V\choose 2}\to[0,1]$ . A natural idea to assign the LP values, that was used by [21], is to set $x_{uv}$ based on the intersection of the neighborhood between $u$ and $v$ . Intuitively, the more common neighbors two nodes share, the closer they should be. A straightforward approach to implementing this idea is to set $x_{uv}=1-\frac{|N(u)\cap N(v)|}{\max(d(u),d(v))}$ , where $N(u)$ denotes the neighboring nodes of $u$ in $G$ and $d(u)=|N(u)|$ denotes the degree of $u$ ; it is convenient to assume $u\in N(u)$ . This approach works for the top- $1$ norm (i.e., the $\ell_{\infty}$ norm) as discussed in [20], but fails for the top- $n$ norm (i.e., the $\ell_{1}$ norm). Consider a star graph, where the optimal clustering under the top- $n$ norm has a cost of $n-2$ . This approach will assign $x_{uv}=1-\frac{1}{2}=1/2$ for all $-$ edges, leading to an LP cost of $\Theta(n^{2})$ and a gap of $\Omega(n)$ . [21] addressed the issue by rounding up LP values to $1$ for $-$ edge, if for a given node, its total $-$ edges LP disagreement is larger than the number of its $+$ edges. After the transformation, the triangle inequalities are only satisfied approximately, but this can be handled with $O(1)$ loss in the approximation ratio.

We address this issue using a different approach, that is inspired by the pre-clustering technique in [17]. We first preprocess the graph $G$ by removing edges $uv\in E$ for which $|N(u)\cap N(v)|$ is small compared to $\max\{d(u),d(v)\}$ . Let the resulting graph be $H$ . We then set our LP values as $x_{uv}=1-\frac{|N_{H}(u)\cap N_{H}(v)|}{\max\{d(u),d(v)\}}$ if $u\neq v$ , where $N_{H}(u)$ is the set of neighbors of $u$ in $H$ . We show that this solution is a $12.66$ -approximation for all top- $k$ norms simultaneously. When compared to [21], in addition to the improved approximation ratio, we obtain a considerably simpler analysis.

Implementation of Algorithm in Nearly-Linear Time and in MPC Model.

We then proceed to discuss our techniques to improve the running time of the algorithm to nearly-linear. The algorithm contains two parts: the construction of the fractional solution $x$ and the rounding procedure. We discuss the two procedures separately.

Constructing $x$ in nearly linear time poses several challenges. First, the construction of the subgraph $H$ requires us to identify edges $uv\in E$ with small $|N(u)\cap N(v)|$ . Second, we can not explicitly assign $x$ values to all $-$ edges. Finally, to compute $x_{uv}$ , we need to compute $|N_{H}(u)\cap N_{H}(v)|$ .

The first and third challenges can be addressed through sampling, with an $O(\log n)$ factor loss in the running time. To avoid considering too many $-$ edges, we only consider $-$ edges with length at most $1-\epsilon$ . Consequently, we only need to consider $-$ edges whose other endpoints share at least an $\epsilon$ fraction of neighbors with $u$ . Given that each neighbor of $u$ in $H$ has degree similar to $u$ , we demonstrate that there will be at most $O(d(u)/\epsilon)$ $-$ edges to consider for each node $u$ . Overall, there will be $\tilde{O}(m\cdot{\mathrm{poly}}(1/\epsilon))$ $-$ edges for which we need to explicitly assign $x$ values. Moreover, the nearly-linear time algorithm for the construction of $x$ can be naturally implemented in the MPC model, with $O(1)$ number of rounds.

Then we proceed to the rounding algorithm for $x$ . We are explicitly given the $x$ values for $+$ edges, and for nearly-linear number of $-$ edges. For other $-$ edges, their $x$ values are $1$ . The KMZ algorithm works as follows: in each round, the algorithm selects a node $u$ as the cluster center and then includes a ball with some radius, meaning the algorithm includes all nodes $v$ such that $x_{uv}\leq\textmd{radius}$ into the cluster, removes the clustered nodes, and repeats the process on the remaining nodes. The rounding algorithm can be easily implemented in nearly-linear time using a priority queue structure. This leads to a nearly-linear time simultaneous $O(1)$ -approximation for correlation clustering for all monotone symmetric norms.

The challenge to implement the algorithm in MPC model is the sequential nature of the algorithm. [26] observes that in each round, if we select the nodes that maximize $L(u)=\sum_{x_{uv}\leq r}(r-x_{uv})$ as cluster center, we can effectively bound each node’s algorithmic cost, where $r$ is the final ratio. However, choosing a node that maximizes some target inherently makes the process sequential. Our key observation is that, instead of selecting the node that maximizes $L(u)$ , we can allow some approximation. This strategy still permits achieving a reasonable approximate ratio with an additional $1+\epsilon$ overhead while allowing the selection of multiple nodes as cluster centers, thereby parallelizing the rounding process. In each round, there might be several candidate cluster centers with conflicts. To resolve these conflicts, we employ the classical Luby’s algorithm [27, 16] to find a maximal independent set, ensuring that none of the cluster centers have conflicts.

Organization.

We give some preliminary remarks in Section 2. In Section 3, we describe our simultaneous $O(1)$ -approximation algorithm for correlation clustering for all top- $k$ norms. The reduction from any monotone symmetric norm to top- $k$ norms is deferred to Appendix A. Combining the results leads to a simultaneous $O(1)$ -approximation algorithm for all monotone symmetric norms. Then in Appendix B and C, we show how we can run the algorithm in the MPC model with nearly linear work. In particular, Appendix B and C discuss how to solve the LP and round the LP solution in the MPC model, respectively. The constant round MPC algorithm is described in Appendix D. Theorem 2, 3, 4 and 5 are proved in Section 3, Appendix C, C and D respectively.

2 Preliminaries

The input to correlation clustering is a complete graph whose edges are partitioned into $+$ edges and $-$ edges. We shall use the graph $G=(V,E)$ of $+$ edges to denote an instance. Let $n=|V|$ and $m=|E|$ . For simplicity, we assume $E$ contains all the $n$ self-loops $uu,u\in V$ . So, $E$ is the set of $+$ edges, and ${V\choose 2}\setminus E$ is the set of $-$ edges. The graph $G$ is fixed in most part of the paper.

For any graph $H=(V_{H},E_{H})$ , and any vertex $v\in V_{H}$ , let $N_{H}(u)=\{v\in V_{H}\mid uv\in E_{H}\}$ . For any vertex $u\in V_{H}$ and any subset $S\subseteq V_{H}$ , we define $d_{H}(u,S)=\sum_{v\in S}\mathds{1}(uv\in E_{H})$ as the number of edges between $u$ and $S$ . We simply use $d_{H}(u)$ for $d_{H}(u,V_{H})$ . When the graph $H$ is the input graph $G$ , we omit the subscript. So we use $N(u)$ for $N_{G}(u)$ and $d(u)$ for $d_{G}(u)$ . Notice that $u\in N(u)$ and $d(u)=|N(u)|\geq 1$ for every $u\in U$ . For the input graph $G=(V,E)$ and any two vertex $u,v\in V$ , we define $M_{uv}=\max\{d(u),d(v)\}$ as the maximum degree of $u$ and $v$ for simplicity, as this notion will be frequently used. For any two sets $X$ and $Y$ , we denote their symmetric difference by $X\Delta Y$ . Algorithms are parameterized by constants $\beta(0<\beta<1),\lambda(0<\lambda<1)$ that will be determined later.

A norm on $n$ -dimensional non-negative vectors is a function $f:\mathbb{R}_{\geq 0}^{n}\to\mathbb{R}_{\geq 0}$ satisfying $f(\alpha x)=\alpha f(x)$ for every real $\alpha\geq 0$ and $x\in\mathbb{R}_{\geq 0}^{n}$ , and $f(x+y)\leq f(x)+f(y)$ for every $x,y\in\mathbb{R}_{\geq 0}^{n}$ . We say a norm $f:\mathbb{R}_{\geq 0}^{n}\to\mathbb{R}_{\geq 0}$ is monotone if for every $x,y\in\mathbb{R}_{\geq 0}^{n}$ with $x\leq y$ , we have $f(x)\leq f(y)$ . We say $f$ is symmetric if $f(x)=f(x^{\prime})$ for every $x,x^{\prime}\in\mathbb{R}_{\geq 0}^{n}$ such that $x^{\prime}$ is a permutation of $x$ . We say $f$ is the top- $k$ norm for an integer $k\in[n]$ if $f(x)$ is equal to the sum of the $k$ largest coordinates of $x$ . Chakrabarty and Swamy [11] showed that any monotone and symmetric norm can be written as the maximum of many ordered norms. This leads to the following lemma which reduces the monotone-symmetric norms to top- $k$ norms. For completeness, we defer its proof to Appendix A.

Lemma 6.

For any integer $k\in[n]$ , if an algorithm returns a single clustering $\mathcal{C_{\text{ALG}}}$ that is simultaneously a $\rho$ -approximation for all top- $k$ norm objectives, then $\mathcal{C_{\text{ALG}}}$ is a $\rho$ -approximation for any monotone and symmetric norm $f:\mathbb{R}^{n}_{\geq 0}\rightarrow\mathbb{R}_{+}$ .

For a fixed clustering $\mathcal{C}$ , we already defined the disagreement vector of $\mathcal{C}$ as $\mathrm{cost}_{\mathcal{C}}\in\mathbb{Z}_{\geq 0}^{n}$ , with $\mathrm{cost}_{\mathcal{C}}(u)$ for every $u\in V$ being the number of edges incident to $u$ that are in disagreement w.r.t $\mathcal{C}$ . Given an integer $k$ , and a clustering $\mathcal{C}$ , we denote the top- $k$ value by $\mathrm{cost}^{k}_{\mathcal{C}}=\max_{T\subseteq V,|T|=k}\sum_{u\in T}\mathrm{% cost}_{\mathcal{C}}(u)$ . Similarly, for any fractional vector $(x_{uv})_{u,v\in V}$ , we denote $\mathrm{cost}_{x}(u)=\sum_{uv\in E}x_{uv}+\sum_{uv\in{V\choose 2}\setminus E}(% 1-x_{uv})$ as the disagreement for $u$ with respect to $x$ . The top- $k$ value of $x$ is defined as $\mathrm{cost}^{k}_{x}=\max_{T\subseteq V,|T|=k}\sum_{u\in T}\mathrm{cost}_{x}(u)$ .

We will use the following theorem from [26]:

Theorem 7.

Let $G=(V,E)$ be a correlation clustering instance, and $x\in[0,1]^{{V\choose 2}}$ be a metric over $V$ with range $[0,1]$ . There is a polynomial time algorithm that, given $G$ and $x$ , outputs a clustering ${\mathcal{C}}$ of $V$ such that $\mathrm{cost}_{{\mathcal{C}}}(u)\leq 5\cdot\mathrm{cost}_{x}(u)$ for every $u\in V$ .

We will use the following well-known concentration inequalities.

Theorem 8 (Chernoff Bound).

Let $X_{1},X_{2},...,X_{k}$ be independent random variables taking values in $\{0,1\}$ . Let $X=\sum_{i}X_{i}$ be the sum of these $k$ random variables. Then the following inequalities hold:

(a)

For any $\epsilon\in(0,1)$ , if $E[X]\leq U$ , then $\Pr[X\geq(1+\epsilon)U]\leq\mathrm{exp}(-\epsilon^{2}U/3)$ .
(b)

For any $\epsilon\in(0,1)$ , if $E[X]\geq U$ , then $\Pr[X\leq(1-\epsilon)U]\leq\mathrm{exp}(-\epsilon^{2}U/2)$ .

3 Simultaneous $O(1)$ -Approximation for Top- $𝒌$ Norms

In this section, we describe our simultaneous $63.3$ -approximation for correlation clustering for all top- $k$ norms. The algorithm described in this section runs in polynomial time. It first constructs an LP solution to the top- $k$ linear program using a combinatorial procedure. Crucially, the construction does not depend on the value of $k$ . We show that the solution has cost $12.66$ times the optimum cost under the top- $k$ norm, for any integer $k\geq 1$ . Then we use the rounding algorithm of [26] to round the LP solution to an integral one. As it gives a vertex-by-vertex 5-approximation guarantee, this leads to a $63.3$ -approximation for the top- $k$ norm for any $k$ .

The LP for minimizing the top- $k$ norm of the clustering is given in LP (1).


$\displaystyle\min\quad$	$\displaystyle\mathrm{cost}^{k}_{x}$		(1a)
s.t.	$\displaystyle x_{uv}+x_{uw}\geq x_{vw},$	$\displaystyle\forall u,v,w\in V,$	(1b)
	$\displaystyle x_{uv}\in[0,1],$	$\displaystyle\forall u,v\in V,$	(1c)
	$\displaystyle x_{uu}=0,$	$\displaystyle\forall u\in V.$	(1d)

In the correspondent integer program, $x_{uv}$ for every $u,v\in V$ indicates if $u v$ is separated or not. We view $u v$ as an unordered pair and thus $x_{uv}$ and $x_{vu}$ are the same variable. So $(x_{uv})_{u,v\in V}$ is a metric with distances in $\{0,1\}$ , which is relaxed to $[0,1]$ in the linear program. This is captured by constraints (1b), (1c) and (1d). Notice that $\mathrm{cost}_{x}(u)$ for any $u\in V$ is a linear function of the $x$ variables. The top- $k$ norm of the fractional clustering is defined by $\mathrm{cost}^{k}_{x}=\max_{S\subseteq V:|S|=k}\sum_{u\in S}\mathrm{cost}_{x}(u)$ . This could be captured by introducing a variable $z$ and constraints $z\geq\sum_{u\in S}\mathrm{cost}_{x}(u)$ for any $S\subseteq V$ of size $k$ , and setting $z$ to be the objective to minimize. For simplicity, we use the form as described. Despite having an exponential number of constraints, the LP can be solved efficiently as there is a simple separation oracle. Moreover, we use a combinatorial algorithm to construct a solution $x$ , and thus the algorithm does not solve the LP.

3.1 Algorithm

The algorithm for constructing the LP solution $x$ is given in Algorithm 1. It depends on the parameter $\beta\in(0,1)$ , whose value will be specified later. During the process, we construct a subgraph $H$ by removing any edge $uv\in E$ where $u$ and $v$ have significantly different neighbors. We then set $x_{uv}$ as $1-\frac{|N_{H}(u)\cap N_{H}(v)|}{M_{uv}}$ if $u\neq v$ and $x_{uv}=0$ otherwise. Recall that $M_{uv}=\max\{d(u),d(v)\}$ is the maximum degree for any nodes $u$ and $v$ in graph $G$ . Intuitively, we treat $+$ edges as indicators of whether two nodes belong to the same cluster. The first step is to remove edges that should not be in the same cluster. The second step ensures that the more common neighbors two nodes have, the closer their distance should be.

Algorithm 1 Construction of norm-oblivious solution

x

to metric LP.

In the remaining part of this section, we will show

Lemma 9.

Let $k$ be any integer in $[n]$ . Algorithm 1 outputs a feasible solution $(x_{uv})_{u,v\in V}$ for (1) such that for any $k$ , we have

\displaystyle\mathrm{cost}^{k}_{x}\leq 12.66\cdot{\mathrm{opt}}^{k},

where ${\mathrm{opt}}^{k}$ is the cost of the optimum solution under the top- $k$ norm.

Proof of Theorem 2.

Once we obtain Lemma 9, [26] provides a rounding algorithm for any feasible solution. Assume the final clustering is $\mathcal{C}_{\text{KMZ}}$ . [26] ensures that for any node $u$ , we have $\mathrm{cost}_{\mathcal{C}_{\text{KMZ}}}(u)\leq 5\mathrm{cost}_{x}(u)$ , meaning the disagreement for $\mathcal{C}_{\text{KMZ}}$ is at most 5 times the disagreement in the LP solution. We can apply the KMZ rounding algorithm to $x_{uv}$ as output by Lemma 9. For any integer $k\in[n]$ , let $U^{\prime}$ be the set of $k$ vertices with the largest disagreement values with respect to $\mathcal{C}_{\text{KMZ}}$ . Then we have:

\displaystyle\mathrm{cost}^{k}_{\mathcal{C}_{\text{KMZ}}}\leq 5\cdot\sum_{u\in U% ^{\prime}}\mathrm{cost}_{x}(u)\leq 5\cdot\mathrm{cost}^{k}_{x}\leq 5\cdot 12.6% 6\cdot{\mathrm{opt}}^{k}=63.3\cdot{\mathrm{opt}}^{k}.

By Lemma 6, we know that $\mathcal{C}_{\text{KMZ}}$ is a simultaneous 63.3-approximate clustering for all monotone symmetric norms. $\hfill\blacktriangleleft$

We will first show that our $x$ is feasible in Section 3.2, then we will bound the approximate ratio in Section 3.3.

3.2 The validity of $𝒙$ to LP (1)

To show that $x$ is a valid solution to LP (1), it suffices to prove that it is a metric over $V$ with range $[0,1]$ . Moreover, (1c) and (1d) hold trivially. Therefore, it remains to show that the triangle inequality (i.e., constraint (1b)) is satisfied:

Lemma 10.

For any $u,v,w\in V$ , we have $x_{uv}+x_{uw}\geq x_{vw}$ .

Proof.

We can assume $u, v, w$ are distinct since otherwise the inequality holds trivially. We assume that $d(v)\geq d(w)$ wlog.

		$\displaystyle x_{uv}+x_{uw}-x_{vw}$
	$\displaystyle\quad=$	$\displaystyle\quad\Big{(}1-\frac{\|N_{H}(u)\cap N_{H}(v)\|}{M_{uv}}\Big{)}+\Big{% (}1-\frac{\|N_{H}(u)\cap N_{H}(w)\|}{M_{uw}}\Big{)}-\Big{(}1-\frac{\|N_{H}(v)\cap N% _{H}(w)\|}{d(v)}\Big{)}$
	$\displaystyle\quad=$	$\displaystyle\quad 1+\frac{\|N_{H}(v)\cap N_{H}(w)\|}{d(v)}-\frac{\|N_{H}(u)\cap N% _{H}(v)\|}{M_{uv}}-\frac{\|N_{H}(u)\cap N_{H}(w)\|}{M_{uw}}$
	$\displaystyle\quad\geq$	$\displaystyle\quad 1+\frac{\|N_{H}(v)\cap N_{H}(w)\|}{M_{uv}}-\frac{\|N_{H}(u)% \cap N_{H}(v)\|}{M_{uv}}-\frac{\|N_{H}(u)\cap N_{H}(w)\|}{M_{uw}}$
	$\displaystyle\quad\geq$	$\displaystyle\quad 1-\frac{\|N_{H}(u)\setminus N_{H}(w)\|}{M_{uv}}-\frac{\|N_{H}(% u)\cap N_{H}(w)\|}{M_{uw}}$
	$\displaystyle\quad\geq$	$\displaystyle\quad 1-\frac{\|N_{H}(u)\setminus N_{H}(w)\|+\|N_{H}(u)\cap N_{H}(w)% \|}{d(u)}\quad=\quad 1-\frac{\|N_{H}(u)\|}{d(u)}\quad\geq\quad 0.$

The first inequality used that $d(v)\leq M_{uv}$ , the second one follows from $|N_{H}(u)\cap N_{H}(v)|-|N_{H}(v)\cap N_{H}(w)|\leq|(N_{H}(u)\cap N_{H}(v))% \setminus(N_{H}(v)\cap N_{H}(w))|=|(N_{H}(v)\cap(N_{H}(u)\setminus N_{H}(w))|% \leq|N_{H}(u)\setminus N_{H}(w)|$ , and the third one used $d(u)\leq M_{uv}$ and $d(u)\leq M_{uw}$ . $\hfill\blacktriangleleft$

3.3 Bounding the Top- $𝒌$ Norm Cost of $𝒙$

In this section, we compare the top- $k$ norm cost of $x$ to ${\mathrm{opt}}^{k}$ .

Notations.

We fix the integer $k\in[n]$ . Let $\mathcal{C}$ be the clustering that minimizes the top- $k$ norm of disagreement vector, but our analysis works for any clustering. For every $v\in V$ , let $C(v)$ be the cluster in ${\mathcal{C}}$ that contains $v$ .

For every $u\in V$ , let $\mathrm{cost}^{+}_{\mathcal{C}}(u),\mathrm{cost}^{-}_{\mathcal{C}}(u)$ and $\mathrm{cost}_{\mathcal{C}}(u)$ respectively be the number of $+$ edges, $-$ edges and edges incident to $u$ that are in disagreement in the clustering ${\mathcal{C}}$ . Recall $\mathrm{cost}^{k}_{\mathcal{C}}=\max_{S\subseteq V:|S|=k}\sum_{u\in S}\mathrm{% cost}_{\mathcal{C}}(u)$ is the top- $k$ norm cost of the clustering ${\mathcal{C}}$ , thus we have ${\mathrm{opt}}^{k}=\mathrm{cost}^{k}_{\mathcal{C}}$ .

Let $U$ be the set of $k$ vertices $u$ with the largest $\mathrm{cost}_{x}(u)$ values. So, $\mathrm{cost}^{k}_{x}=\sum_{u\in U}\mathrm{cost}_{x}(u)$ . In order to provide a clear demonstration, we divide all of the edges in $G$ into five parts. First, we separate out the parts that are easily constrained by $\mathrm{cost}^{k}_{\mathcal{C}}$ . Let $\varphi^{+}_{1}$ be the set of $+$ edges that are cut in ${\mathcal{C}}$ , and $\varphi^{-}_{1}$ be the set of $-$ edges that are not cut in ${\mathcal{C}}$ . For the remaining $+$ edges in $E$ that are not cut in ${\mathcal{C}}$ , it is necessary to utilize the properties of $+$ edges in $E_{H}$ . To this end, let $\varphi^{+}_{2}$ be the set of $+$ edges in $E\setminus E_{H}$ that are not cut in ${\mathcal{C}}$ , and $\varphi^{+}_{3}$ be the set of $+$ edges in $E_{H}$ that are not cut in ${\mathcal{C}}$ . Finally, we define $\varphi^{-}_{2}$ as the set of $-$ edges that are cut in ${\mathcal{C}}$ . Formally, we set

		$\displaystyle\varphi^{+}_{1}:=\{uv\mid uv\in E,C(u)\not=C(v)\},\quad\varphi^{+% }_{2}:=\{uv\mid uv\in E\setminus E_{H},C(u)=C(v)\},$
		$\displaystyle\varphi^{+}_{3}:=\{uv\mid uv\in E_{H},C(u)=C(v)\},\quad\varphi^{-% }_{1}:=\{uv\mid uv\in{V\choose 2}\setminus E,C(u)=C(v)\},$
	and	$\displaystyle\varphi^{-}_{2}:=\{uv\mid uv\in{V\choose 2}\setminus E,C(u)\not=C% (v)\}.$

For every $(i,j)\in\{(1,+),(2,+),(3,+),(1,-),(2,-)\}$ and $u\in V$ , we let $\varphi^{j}_{i}(u)$ be the set of pairs in $\varphi^{j}_{i}$ incident to $u$ . Notice that $\varphi^{+}_{3}$ contains all the self-loops. We use $\phi^{j}_{i}(u)=\{v:uv\in\varphi^{j}_{i}(u)\}$ to denote the end-vertices of the edges in $\varphi^{j}_{i}(u)$ other than $u$ ; so $|\phi^{j}_{i}(u)|=|\varphi^{j}_{i}(u)|$ . We let $y^{j}_{i}(u)$ denote the cost of edges in $\varphi^{j}_{i}(u)$ in the solution $x$ . For every $(i,j)\in\{(1,+),(2,+),(3,+),(1,-),(2,-)\}$ , we define $f^{j}_{i}=\sum_{u\in U}y^{j}_{i}(u)$ . Therefore, the top- $k$ norm cost of $x$ is $f^{+}_{1}+f^{+}_{2}+f^{+}_{3}+f^{-}_{1}+f^{-}_{2}$ .

With the notations defined, we can proceed to the analysis. Prior to this, several propositions are presented, which will prove useful in the following analysis. We start with the property of edges in $E\setminus E_{H}$ .

Lemma 11.

For every $uv\in\varphi^{+}_{2}$ , we have $\mathrm{cost}_{\mathcal{C}}(u)+\mathrm{cost}_{\mathcal{C}}(v)\geq\beta\cdot M_% {uv}$ .

Proof.

Since $C(u)=C(v)$ , there are at least $|N(u)\Delta N(v)|$ disagreements incident to $u$ and $v$ . For every $uv\in\varphi^{+}_{2}$ , we have $\mathrm{cost}_{\mathcal{C}}(u)+\mathrm{cost}_{\mathcal{C}}(v)\geq|N(u)\Delta N% (v)|\geq\beta\cdot M_{uv}.$ $\hfill\blacktriangleleft$

Then, we show that edges in $E_{H}$ have similar degrees.

Lemma 12.

For every $uv\in E_{H}$ , we have $(1-\beta)\cdot d(v)\leq d(u)\leq\frac{1}{1-\beta}\cdot d(v)$ .

Proof.

For every $uv\in E_{H}$ , we have $|N(u)\Delta N(v)|\leq\beta\cdot M_{uv}$ . Therefore,

	$\displaystyle\min\{d(u),d(v)\}$	$\displaystyle\geq\|N(u)\cap N(v)\|=\|N(u)\cup N(v)\|-\|N(u)\Delta N(v)\|$
		$\displaystyle\geq(1-\beta)\cdot\max\{d(u),d(v)\},$

which gives us $(1-\beta)\cdot d(v)\leq d(u)\leq\frac{1}{1-\beta}\cdot d(v)$ since $\beta\in(0,1)$ . $\hfill\blacktriangleleft$

As we are about to analyze the top- $k$ norm objective, we can bound the cost in the solution $x$ using coefficients of $\mathrm{cost}_{\mathcal{C}}$ . This key observation can be formally demonstrated as follows:

Claim 13.

Let $c\in\mathbb{R}_{\geq 0}^{n}$ be a vector and $\alpha>0$ satisfying that $|c|_{\infty}\leq\alpha$ and $|c|_{1}\leq\alpha k$ . Then we have

\displaystyle\sum_{r\in V}c(r)\cdot\mathrm{cost}_{\mathcal{C}}(r)\leq\alpha% \cdot\mathrm{cost}^{k}_{\mathcal{C}}.

Proof.

We have $|c/\alpha|_{1}\leq k$ and $|c/\alpha|_{\infty}\leq 1$ . It is well known that $c/\alpha$ is a convex combination of $\{0,1\}$ -vectors, each of which has at most $k$ 1’s. For each $\{0,1\}$ -vector $d$ in the combination, we have $\sum_{r\in V}d(r)\mathrm{cost}_{{\mathcal{C}}}(r)\leq\mathrm{cost}^{k}_{{% \mathcal{C}}}$ . Taking the convex combination of these inequalities gives $\sum_{r\in V}\frac{c(r)}{\alpha}\cdot\mathrm{cost}_{\mathcal{C}}(r)\leq\mathrm% {cost}^{k}_{\mathcal{C}}$ , which is equivalent to the inequality in the claim. $\hfill\vartriangleleft$

From the definitions of $f^{+}_{1}$ and $f^{-}_{1}$ , we can see that it can be constrained by $\mathrm{cost}^{k}_{\mathcal{C}}$ .

Claim 14.

$f^{+}_{1}+f^{-}_{1}\leq\mathrm{cost}^{k}_{\mathcal{C}}$ .

Proof.

	$\displaystyle f^{+}_{1}+f^{-}_{1}$	$\displaystyle=\sum_{u\in U}(y^{+}_{1}(u)+y^{-}_{1}(u))=\sum_{u\in U,uv\in% \varphi^{+}_{1}\cup\varphi^{-}_{1}}x_{uv}$
		$\displaystyle\leq\sum_{u\in U}(\|\varphi^{+}_{1}(u)\|+\|\varphi^{-}_{1}(u)\|)\leq% \sum_{u\in U}\mathrm{cost}_{\mathcal{C}}(u)\leq\mathrm{cost}^{k}_{\mathcal{C}}.$

$\hfill\vartriangleleft$

To bound the cost of the remaining $+$ edges, we separately analyze the cost coefficients of vertices in $f^{+}_{2}$ and $f^{+}_{3}$ .

Lemma 15.

There exists a vector $c^{+}_{2}\in\mathbb{R}_{\geq 0}^{n}$ with the following properties:

(a)

$f^{+}_{2}\leq\sum_{r\in V}c^{+}_{2}(r)\cdot\mathrm{cost}_{\mathcal{C}}(r)$ .
(b)

$c^{+}_{2}(r)\leq\frac{2}{\beta}\cdot\frac{|\varphi^{+}_{2}(r)|}{d(r)}$ , for every $r\in V$ .
(c)

$|c^{+}_{2}|_{1}\leq\frac{2}{\beta}\sum_{u\in U}\frac{|\varphi^{+}_{2}(u)|}{d(u)}$ .

Proof.

We bound $f^{+}_{2}$ as follows:

\displaystyle f^{+}_{2}\leq\sum_{u\in U,uv\in\varphi^{+}_{2}}1\leq\sum_{u\in U% ,uv\in\varphi^{+}_{2}}\frac{\mathrm{cost}_{\mathcal{C}}(u)+\mathrm{cost}_{% \mathcal{C}}(v)}{\beta\cdot M_{uv}}=:\sum_{r\in V}c^{+}_{2}(r)\cdot\mathrm{% cost}_{\mathcal{C}}(r).

The second inequality used Lemma 11. Therefore, Lemma 15(a) holds.

To show Lemma 15(b), we bound the coefficients for $\mathrm{cost}_{\mathcal{C}}(u)$ and $\mathrm{cost}_{\mathcal{C}}(v)$ respectively. If $u\in U$ , the coefficient for $\mathrm{cost}_{\mathcal{C}}(u)$ is $\sum_{uv\in\varphi^{+}_{2}}\frac{1}{\beta\cdot M_{uv}}\leq\frac{|\varphi^{+}_{% 2}(u)|}{\beta\cdot d(u)}$ ; if $u\notin U$ , the coefficient is $0$ . The coefficient for $\mathrm{cost}_{\mathcal{C}}(v)$ is $\sum_{u\in U\cap\phi^{+}_{2}(v)}\frac{1}{\beta\cdot M_{uv}}\leq\frac{|\varphi^% {+}_{2}(v)|}{\beta\cdot d(v)}$ . Therefore, $c^{+}_{2}(r)\leq\frac{2\cdot|\varphi^{+}_{2}(r)|}{\beta\cdot d(r)}$ .

To bound $|c^{+}_{2}|_{1}$ , we can replace $\mathrm{cost}_{\mathcal{C}}(u)$ and $\mathrm{cost}_{\mathcal{C}}(v)$ with 1.

Then $|c^{+}_{2}|_{1}=\sum_{u\in U,uv\in\varphi^{+}_{2}}\frac{2}{\beta\cdot M_{uv}}% \leq\frac{2}{\beta}\sum_{u\in U}\frac{|\varphi^{+}_{2}(u)|}{d(u)}$ . This proves Lemma 15(c). $\hfill\blacktriangleleft$

Following the analysis of the cost coefficients $c^{+}_{2}$ of $f^{+}_{2}$ , we analyze the cost coefficients of $f^{+}_{3}$ in edge set $E_{H}$ .

Lemma 16.

There exists a vector $c^{+}_{3}\in\mathbb{R}_{\geq 0}^{n}$ with the following properties:

(a)

$f^{+}_{3}\leq\sum_{r\in V}c^{+}_{3}(r)\cdot\mathrm{cost}_{\mathcal{C}}(r)$ .
(b)

$c^{+}_{3}(r)\leq 2\left(\frac{|\varphi^{+}_{2}(r)|\cdot|\varphi^{+}_{3}(r)|}{% \beta\cdot d^{2}(r)}+\frac{|\varphi^{+}_{2}(r)|}{\beta\cdot d(r)}+\frac{|% \varphi^{+}_{3}(r)|}{d(r)}\right)$ , for every $r\in V$ .
(c)

$|c^{+}_{3}|_{1}\leq 2\sum_{u\in U}\left(\frac{|\varphi^{+}_{2}(u)|\cdot|% \varphi^{+}_{3}(u)|}{\beta\cdot d^{2}(u)}+\frac{|\varphi^{+}_{3}(u)|}{\beta% \cdot d(u)}+\frac{|\varphi^{+}_{3}(u)|}{d(u)}\right)$ .

Proof.

Fix some $uv\in\varphi^{+}_{3}\subseteq E_{H}$ with $u\neq v$ . We let $\tilde{u}=\arg\max_{w\in\{u,v\}}d(w)$ , and $\tilde{v}$ be the other vertex in $\{u,v\}$ . Notice that $d(\tilde{u})=M_{uv}$ . Then we have

$\displaystyle x_{uv}$	$\displaystyle=1-\frac{\|N_{H}(u)\cap N_{H}(v)\|}{d(\tilde{u})}$	(2)
	$\displaystyle=\frac{d(\tilde{u})-\|N_{H}(u)\cap N_{H}(v)\|}{d(\tilde{u})}=\frac{% \|\varphi^{+}_{1}({\tilde{u}})\|+\|\varphi^{+}_{2}({\tilde{u}})\|+\|\varphi^{+}_{3}% ({\tilde{u}})\|-\|N_{H}(\tilde{u})\cap N_{H}(\tilde{v})\|}{d({\tilde{u}})}$
	$\displaystyle\leq\frac{\mathrm{cost}^{+}_{\mathcal{C}}({\tilde{u}})+\|\varphi^{% +}_{2}({\tilde{u}})\|+\|\varphi^{+}_{2}({\tilde{v}})\|+\mathrm{cost}^{-}_{% \mathcal{C}}({\tilde{v}})}{d({\tilde{u}})}\leq\frac{\|\varphi^{+}_{2}(u)\|+\|% \varphi^{+}_{2}(v)\|+\mathrm{cost}_{\mathcal{C}}(u)+\mathrm{cost}_{\mathcal{C}}% (v)}{M_{uv}}.$	(3)

We prove the first inequality in sequence (3). Notice $N_{H}(\tilde{u})\supseteq\phi^{+}_{3}(\tilde{u})$ . Therefore,

	$\displaystyle\|\varphi^{+}_{3}({\tilde{u}})\|-\|N_{H}(\tilde{u})\cap N_{H}(\tilde% {v})\|$	$\displaystyle\leq\|\phi^{+}_{3}({\tilde{u}})\|-\|\phi^{+}_{3}(\tilde{u})\cap N_{H% }(\tilde{v})\|$
		$\displaystyle=\|\phi^{+}_{3}(\tilde{u})\setminus N_{H}(\tilde{v})\|\leq\|\phi^{+}% _{2}(\tilde{v})\cup\phi^{-}_{1}(\tilde{v})\|$
		$\displaystyle=\|\varphi^{+}_{2}(\tilde{v})\cup\varphi^{-}_{1}(\tilde{v})\|\leq\|% \varphi^{+}_{2}(\tilde{v})\|+\mathrm{cost}^{-}_{\mathcal{C}}(\tilde{v}).$

Above, we used that $\phi^{+}_{3}(\tilde{u})\setminus N_{H}(\tilde{v})\subseteq C(\tilde{u})% \setminus N_{H}(\tilde{v})=C(\tilde{v})\setminus N_{H}(\tilde{v})\subseteq\phi% ^{+}_{2}(\tilde{v})\cup\phi^{-}_{1}(\tilde{v})$ . Also (3) holds trivially when $u=v$ ; this holds for every $uv\in\varphi^{+}_{3}$ .

Therefore,

	$\displaystyle f^{+}_{3}$	$\displaystyle=\sum_{u\in U}y^{+}_{3}(u)\leq\sum_{u\in U,uv\in\varphi^{+}_{3}}% \frac{\|\varphi^{+}_{2}(u)\|+\|\varphi^{+}_{2}(v)\|+\mathrm{cost}_{\mathcal{C}}(u)% +\mathrm{cost}_{\mathcal{C}}(v)}{M_{uv}}$
		$\displaystyle\leq\sum_{\begin{subarray}{c}u\in U,uv\in\varphi^{+}_{3},uw\in% \varphi^{+}_{2}\end{subarray}}\frac{1}{M_{uv}}+\sum_{u\in U,uv\in\varphi^{+}_{% 3},vw\in\varphi^{+}_{2}}\frac{1}{M_{uv}}+\sum_{u\in U,uv\in\varphi^{+}_{3}}% \frac{\mathrm{cost}_{\mathcal{C}}(u)+\mathrm{cost}_{\mathcal{C}}(v)}{M_{uv}}$
		$\displaystyle\leq\sum_{u\in U,uv\in\varphi^{+}_{3},uw\in\varphi^{+}_{2}}\frac{% \mathrm{cost}_{\mathcal{C}}(u)+\mathrm{cost}_{\mathcal{C}}(w)}{\beta\cdot M_{% uv}M_{uw}}+\sum_{\begin{subarray}{c}u\in U,uv\in\varphi^{+}_{3},vw\in\varphi^{% +}_{2}\end{subarray}}\frac{\mathrm{cost}_{\mathcal{C}}(v)+\mathrm{cost}_{% \mathcal{C}}(w)}{\beta\cdot M_{uv}M_{vw}}$
		$\displaystyle\quad+\sum_{u\in U,uv\in\varphi^{+}_{3}}\frac{\mathrm{cost}_{% \mathcal{C}}(u)+\mathrm{cost}_{\mathcal{C}}(v)}{M_{uv}}=:\sum_{r\in V}c^{+}_{3% }(r)\cdot\mathrm{cost}_{\mathcal{C}}(r).$

Again, we used Lemma 11 twice to prove the last inequality. Therefore, Lemma 16(a) holds.

To prove Lemma 16(b), we consider the coefficients for $\mathrm{cost}_{\mathcal{C}}(u),\mathrm{cost}_{\mathcal{C}}(v)$ and $\mathrm{cost}_{\mathcal{C}}(w)$ respectively. For a $u\in U$ , the coefficient for $\mathrm{cost}_{\mathcal{C}}(u)$ is

\displaystyle\sum_{uv\in\varphi^{+}_{3},uw\in\varphi^{+}_{2}}\frac{1}{\beta% \cdot M_{uv}M_{uw}}+\sum_{uv\in\varphi^{+}_{3}}\frac{1}{M_{uv}}\leq\frac{|% \varphi^{+}_{3}(u)|\cdot|\varphi^{+}_{2}(u)|}{\beta d^{2}(u)}+\frac{|\varphi^{% +}_{3}(u)|}{d(u)}.

The coefficient for $\mathrm{cost}_{\mathcal{C}}(v)$ is

\displaystyle\sum_{u\in U\cap\phi^{+}_{3}(v),vw\in\varphi^{+}_{2}}\frac{1}{% \beta\cdot M_{uv}M_{vw}}+\sum_{u\in U\cap\phi^{+}_{3}(v)}\frac{1}{M_{uv}}\leq% \frac{|\varphi^{+}_{3}(v)|\cdot|\varphi^{+}_{2}(v)|}{\beta d^{2}(v)}+\frac{|% \varphi^{+}_{3}(v)|}{d(v)}.

The coefficient for $\mathrm{cost}_{\mathcal{C}}(w)$ is at most

	$\displaystyle\quad\sum_{u\in U\cap\phi^{+}_{2}(w),uv\in\varphi^{+}_{3}}\frac{1% }{\beta\cdot M_{uv}M_{uw}}+\sum_{vw\in\varphi^{+}_{2},u\in U\cap\phi^{+}_{3}(v% )}\frac{1}{\beta\cdot M_{uv}M_{vw}}$
	$\displaystyle\leq\sum_{u\in U\cap\phi^{+}_{2}(w)}\frac{d(u)}{\beta\cdot d(u)M_% {uw}}+\sum_{vw\in\varphi^{+}_{2}}\frac{d(v)}{\beta\cdot d(v)M_{vw}}$
	$\displaystyle=\sum_{u\in U\cap\phi^{+}_{2}(w)}\frac{1}{\beta\cdot M_{uw}}+\sum% _{vw\in\varphi^{+}_{2}}\frac{1}{\beta\cdot M_{vw}}\leq\frac{\|\varphi^{+}_{2}(w% )\|}{\beta\cdot d(w)}+\frac{\|\varphi^{+}_{2}(w)\|}{\beta\cdot d(w)}=\frac{2\cdot% \|\varphi^{+}_{2}(w)\|}{\beta\cdot d(w)}.$

Therefore, for every $r\in V$ , we have $c^{+}_{3}(r)\leq 2\left(\frac{|\varphi^{+}_{2}(r)|\cdot|\varphi^{+}_{3}(r)|}{% \beta\cdot d^{2}(r)}+\frac{|\varphi^{+}_{2}(r)|}{\beta\cdot d(r)}+\frac{|% \varphi^{+}_{3}(r)|}{d(r)}\right)$ .

We then bound $|c^{+}_{3}|_{1}$ as follows:

	$\displaystyle\|c^{+}_{3}\|_{1}$	$\displaystyle=\sum_{u\in U,uv\in\varphi^{+}_{3},uw\in\varphi^{+}_{2}}\frac{2}{% \beta\cdot M_{uv}M_{uw}}+\sum_{u\in U,uv\in\varphi^{+}_{3},vw\in\varphi^{+}_{2% }}\frac{2}{\beta\cdot M_{uv}M_{vw}}+\sum_{u\in U,uv\in\varphi^{+}_{3}}\frac{2}% {M_{uv}}$
		$\displaystyle\leq\sum_{u\in U}\left(\frac{2\cdot\|\varphi^{+}_{2}(u)\|\cdot\|% \varphi^{+}_{3}(u)\|}{\beta\cdot d(u)\cdot d(u)}+\frac{2\cdot\|\varphi^{+}_{3}(u% )\|}{\beta\cdot d(u)}+\frac{2\cdot\|\varphi^{+}_{3}(u)\|}{d(u)}\right)$
		$\displaystyle=2\sum_{u\in U}\left(\frac{\|\varphi^{+}_{2}(u)\|\cdot\|\varphi^{+}_% {3}(u)\|}{\beta\cdot d^{2}(u)}+\frac{\|\varphi^{+}_{3}(u)\|}{\beta\cdot d(u)}+% \frac{\|\varphi^{+}_{3}(u)\|}{d(u)}\right).$

This finishes the proof of Lemma 16(c) and thus Lemma 16. $\hfill\blacktriangleleft$

With Lemma 15 and Lemma 16, we can then bound $f^{+}_{2}+f^{+}_{3}$ :

Lemma 17.

$f^{+}_{2}+f^{+}_{3}\leq\frac{4}{\beta}\cdot\mathrm{cost}^{k}_{\mathcal{C}}$ .

Proof.

We define $c(r)=c^{+}_{2}(r)+c^{+}_{3}(r)$ for every $r\in V$ . Then,

	$\displaystyle c(r)$	$\displaystyle\leq\frac{2\cdot\|\varphi^{+}_{2}(r)\|}{\beta\cdot d(r)}+2\left(% \frac{\|\varphi^{+}_{2}(r)\|\cdot\|\varphi^{+}_{3}(r)\|}{\beta\cdot d^{2}(r)}+% \frac{\|\varphi^{+}_{2}(r)\|}{\beta\cdot d(r)}+\frac{\|\varphi^{+}_{3}(r)\|}{d(r)}\right)$
		$\displaystyle=2\left(\frac{\|\varphi^{+}_{2}(r)\|\cdot\|\varphi^{+}_{3}(r)\|}{% \beta\cdot d^{2}(r)}+\frac{\|\varphi^{+}_{3}(r)\|}{d(r)}+\frac{2\cdot\|\varphi^{+% }_{2}(r)\|}{\beta\cdot d(r)}\right)$
		$\displaystyle\leq 2\left(\frac{\|\varphi^{+}_{3}(r)\|}{\beta\cdot d(r)}+\frac{\|% \varphi^{+}_{3}(r)\|}{\beta\cdot d(r)}+\frac{2\cdot\|\varphi^{+}_{2}(r)\|}{\beta% \cdot d(r)}\right)$
		$\displaystyle\leq\frac{4}{\beta},$

where the first inequality holds because Lemma 15(b) and Lemma 16(b), the second inequality holds because $|\varphi^{+}_{2}(r)|\leq d(r)$ and $\beta<1$ , the last inequality holds because $|\varphi^{+}_{2}(r)|+|\varphi^{+}_{3}(r)|\leq d(r)$ .

Similarly, we have

	$\displaystyle\|c\|_{1}$	$\displaystyle\leq\frac{2}{\beta}\sum_{u\in U}\frac{\|\varphi^{+}_{2}(u)\|}{d(u)}% +2\sum_{u\in U}\left(\frac{\|\varphi^{+}_{2}(u)\|\cdot\|\varphi^{+}_{3}(u)\|}{% \beta\cdot d^{2}(u)}+\frac{\|\varphi^{+}_{3}(u)\|}{\beta\cdot d(u)}+\frac{\|% \varphi^{+}_{3}(u)\|}{d(u)}\right)$
		$\displaystyle=2\sum_{u\in U}\left(\frac{\|\varphi^{+}_{2}(u)\|\cdot\|\varphi^{+}_% {3}(u)\|}{\beta\cdot d^{2}(u)}+\frac{\|\varphi^{+}_{3}(u)\|}{\beta\cdot d(u)}+% \frac{\|\varphi^{+}_{3}(u)\|}{d(u)}+\frac{\|\varphi^{+}_{2}(u)\|}{\beta\cdot d(u)}\right)$
		$\displaystyle\leq 2\sum_{u\in U}\left(\frac{\|\varphi^{+}_{2}(u)\|}{\beta\cdot d% (u)}+\frac{\|\varphi^{+}_{3}(u)\|}{\beta\cdot d(u)}+\frac{\|\varphi^{+}_{3}(u)\|}{% \beta\cdot d(u)}+\frac{\|\varphi^{+}_{2}(u)\|}{\beta\cdot d(u)}\right)$
		$\displaystyle\leq\frac{4}{\beta}\cdot k,$

where the first inequality holds because Lemma 15(c) and Lemma 16(c), the second inequality holds because $\beta<1$ and for any $u\in U$ there is $|\varphi^{+}_{3}(u)|\leq d(u)$ , the last inequality holds because for any $u\in U$ there is $|\varphi^{+}_{2}(u)|+|\varphi^{+}_{3}(u)|\leq d(u)$ .

The lemma follows from Claim 13. $\hfill\blacktriangleleft$

For the cost of the remaining $-$ edges, i.e. $f^{-}_{2}$ , we also analyze the cost coefficients of each vertex.

Lemma 18.

There exists a vector $c^{-}_{2}\in\mathbb{R}_{\geq 0}^{n}$ with the following properties:

(a)

$f^{-}_{2}\leq\sum_{r\in V}c^{-}_{2}(r)\cdot\mathrm{cost}_{\mathcal{C}}(r)$ .
(b)

$c^{-}_{2}(r)\leq\frac{2}{1-\beta}$ , for every $r\in V$ .
(c)

$|c^{-}_{2}|_{1}\leq\frac{2k}{1-\beta}$ .

Proof.

We have

\displaystyle f^{-}_{2}

\displaystyle=\sum_{u\in U,uv\in\varphi^{-}_{2}}(1-x_{uv})=\sum_{u\in U,uv\in% \varphi^{-}_{2}}\frac{|N_{H}(u)\cap N_{H}(v)|}{M_{uv}}\leq\sum_{\begin{% subarray}{c}u\in U,uv\in\varphi^{-}_{2}\\ w\in N_{H}(u)\cap N_{H}(v)\end{subarray}}\frac{1}{M_{uv}}.

Given that $uv\in\varphi^{-}_{2}$ indicates that $C(u)\neq C(v)$ , we distinguish between two cases: $C(w)=C(u)$ and $C(w)\neq C(u)$ . For simplicity, in the summations below, we automatically impose the constraints $u\in U,uv\in\varphi^{-}_{2}$ and $uw,vw\in E_{H}$ when the vertices involved in constraints are defined. Notice that we have $d(v)\geq(1-\beta)d(w)$ from Lemma 12.

	$\displaystyle\sum_{u,v,w:C(w)=C(u)}\frac{1}{M_{uv}}$	$\displaystyle\leq\sum_{u,w:C(w)=C(u)}\frac{\mathrm{cost}^{+}_{{\mathcal{C}}}(w% )}{\max(d(u),(1-\beta)d(w))}.$		(4)
	$\displaystyle\sum_{u,v,w:C(w)\neq C(u)}\frac{1}{M_{uv}}$	$\displaystyle\leq\sum_{u,w:C(w)\neq C(u)}\frac{d(w)}{d(u)}\leq\sum_{u,w:C(w)% \neq C(u)}\frac{1}{1-\beta}\leq\sum_{u}\frac{\mathrm{cost}^{+}_{\mathcal{C}}(u% )}{1-\beta}.$		(5)

Adding (4) and (5), and using that $\mathrm{cost}^{+}_{\mathcal{C}}(u)\leq\mathrm{cost}_{\mathcal{C}}(u)$ for every $u\in V$ , we get

\displaystyle f^{-}_{2}\leq\sum_{u\in U,w\in N_{H}(u):C(w)=C(u)}\frac{\mathrm{% cost}_{{\mathcal{C}}}(w)}{\max(d(u),(1-\beta)d(w))}+\sum_{u\in U}\frac{\mathrm% {cost}_{\mathcal{C}}(u)}{1-\beta}=:\sum_{r\in V}c^{-}_{2}(r)\cdot\mathrm{cost}% _{\mathcal{C}}(r).

Therefore, Lemma 18(a) holds.

To prove Lemma 18(b), we consider the coefficients for $\mathrm{cost}_{\mathcal{C}}(u)$ and $\mathrm{cost}_{\mathcal{C}}(w)$ respectively. The coefficient for $\mathrm{cost}_{\mathcal{C}}(u)$ is $\frac{1}{1-\beta}$ . The coefficient for $\mathrm{cost}_{\mathcal{C}}(w)$ is

\displaystyle\sum_{u\in U,w\in N_{H}(u):C(w)=C(u)}\frac{1}{\max(d(u),(1-\beta)% d(w))}

\displaystyle\leq\sum_{u\in U,w\in N_{H}(u):C(w)=C(u)}\frac{1}{(1-\beta)d(w)}% \leq\frac{1}{1-\beta}.

Therefore, for any $r\in V$ , we have $c^{-}_{2}(r)\leq\frac{2}{1-\beta}$ .

By replacing $\mathrm{cost}_{\mathcal{C}}(u)$ and $\mathrm{cost}_{\mathcal{C}}(w)$ with 1, we then finished the proof of Lemma 18(c) as follows:

	$\displaystyle\|c^{-}_{2}\|_{1}$	$\displaystyle\leq\sum_{u\in U,w\in N_{H}(u):C(w)=C(u)}\frac{1}{\max(d(u),(1-% \beta)d(w))}+\sum_{u\in U}\frac{1}{1-\beta}$
		$\displaystyle\leq\sum_{u\in U,w\in N_{H}(u):C(w)=C(u)}\frac{1}{d(u)}+\sum_{u% \in U}\frac{1}{1-\beta}$
		$\displaystyle\leq\sum_{u\in U}\frac{\|N_{H}(u)\|}{d(u)}+\sum_{u\in U}\frac{1}{1-% \beta}\leq\sum_{u\in U}1+\sum_{u\in U}\frac{1}{1-\beta}\leq\sum_{u\in U}\frac{% 2}{1-\beta}=\frac{2k}{1-\beta}.\$

$\hfill\blacktriangleleft$

With Lemma 18, we can then bound $f^{-}_{2}$ .

Lemma 19.

$f^{-}_{2}\leq\frac{2}{1-\beta}\cdot\mathrm{cost}^{k}_{\mathcal{C}}$ .

Proof.

By Claim 13 and Lemma 18, we have $f^{-}_{2}\leq\sum_{r\in V}c^{-}_{2}(r)\cdot\mathrm{cost}_{\mathcal{C}}(r)\leq% \frac{2}{1-\beta}\cdot\mathrm{cost}^{k}_{\mathcal{C}}$ . $\hfill\blacktriangleleft$

So the final ratio for Algorithm 1 is $1+\frac{4}{\beta}+\frac{2}{1-\beta}$ . Let $\beta=0.5858$ , then the ratio is at most $12.66$ . This finishes the proof of Lemma 9.

References

[1] Rakesh Agrawal, Alan Halverson, Krishnaram Kenthapadi, Nina Mishra, and Panayiotis Tsaparas. Generating labels from clicks. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 172–181, 2009. doi:10.1145/1498759.1498824.
[2] Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. Journal of the ACM, 55(5):1–27, 2008. doi:10.1145/1411509.1411513.
[3] Sepehr Assadi and Chen Wang. Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions. In Proceedings of the 13th Conference on Innovations in Theoretical Computer Science (ITCS), volume 215 of LIPIcs, pages 10:1–10:20, 2022. doi:10.4230/LIPICS.ITCS.2022.10.
[4] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine learning, 56(1):89–113, 2004. doi:10.1023/B:MACH.0000033116.57574.95.
[5] Guy E Blelloch, Jeremy T Fineman, and Julian Shun. Greedy sequential maximal independent set and matching are parallel on average. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pages 308–317, 2012. doi:10.1145/2312005.2312058.
[6] Mélanie Cambus, Davin Choo, Havu Miikonen, and Jara Uitto. Massively parallel correlation clustering in bounded arboricity graphs. In 35th International Symposium on Distributed Computing (DISC), volume 209 of LIPIcs, pages 15:1–15:18, 2021. doi:10.4230/LIPICS.DISC.2021.15.
[7] Mélanie Cambus, Fabian Kuhn, Etna Lindy, Shreyas Pai, and Jara Uitto. A (3+ $\varepsilon$ )-approximate correlation clustering algorithm in dynamic streams. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2861–2880. SIAM, 2024. doi:10.1137/1.9781611977912.101.
[8] Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, Alantha Newman, and Lukas Vogl. Understanding the cluster linear program for correlation clustering. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 1605–1616, 2024. doi:10.1145/3618260.3649749.
[9] Nairen Cao, Shang-En Huang, and Hsin-Hao Su. Breaking 3-factor approximation for correlation clustering in polylogarithmic rounds. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 4124–4154. SIAM, 2024. doi:10.1137/1.9781611977912.143.
[10] Deepayan Chakrabarti, Ravi Kumar, and Kunal Punera. A graph-theoretic approach to webpage segmentation. In Proceedings of the 17th International conference on World Wide Web (WWW), pages 377–386, 2008. doi:10.1145/1367497.1367549.
[11] Deeparnab Chakrabarty and Chaitanya Swamy. Approximation algorithms for minimum norm and ordered optimization problems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 126–137, 2019. doi:10.1145/3313276.3316322.
[12] Moses Charikar, Neha Gupta, and Roy Schwartz. Local guarantees in graph cuts and clustering. In International Conference on Integer Programming and Combinatorial Optimization (IPCO), pages 136–147, 2017. doi:10.1007/978-3-319-59250-3_12.
[13] Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360–383, 2005. doi:10.1016/J.JCSS.2004.10.012.
[14] Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. Near optimal LP rounding algorithm for correlation clustering on complete and complete $k$ -partite graphs. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC), pages 219–228, 2015. doi:10.1145/2746539.2746604.
[15] Yudong Chen, Sujay Sanghavi, and Huan Xu. Clustering sparse graphs. In Advances in Neural Information Processing Systems (Neurips), pages 2204–2212, 2012.
[16] Flavio Chierichetti, Nilesh Dalvi, and Ravi Kumar. Correlation clustering in MapReduce. In Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 641–650, 2014. doi:10.1145/2623330.2623743.
[17] Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, and Jakub Tarnawski. Correlation clustering in constant many parallel rounds. In International Conference on Machine Learning, pages 2069–2078. PMLR, 2021. URL: http://proceedings.mlr.press/v139/cohen-addad21b.html.
[18] Vincent Cohen-Addad, Euiwoong Lee, Shi Li, and Alantha Newman. Handling correlated rounding error via preclustering: A 1.73-approximation for correlation clustering. In Proceedings of the 64rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2023.
[19] Vincent Cohen-Addad, Euiwoong Lee, and Alantha Newman. Correlation clustering with Sherali-Adams. In Proceedings of 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 651–661, 2022. doi:10.1109/FOCS54457.2022.00068.
[20] Sami Davies, Benjamin Moseley, and Heather Newman. Fast combinatorial algorithms for min max correlation clustering. arXiv preprint arXiv:2301.13079, 2023. doi:10.48550/arXiv.2301.13079.
[21] Sami Davies, Benjamin Moseley, and Heather Newman. Simultaneously approximating all lp-norms in correlation clustering. In 51st International Colloquium on Automata, Languages, and Programming, ICALP 2024, 2024.
[22] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008. doi:10.1145/1327452.1327492.
[23] Manuela Fischer and Andreas Noever. Tight analysis of parallel randomized greedy mis. ACM Transactions on Algorithms (TALG), 16(1):1–13, 2019. doi:10.1145/3326165.
[24] Holger SG Heidrich, Jannik Irmai, and Bjoern Andres. A 4-approximation algorithm for min max correlation clustering. In AISTATS, pages 1945–1953, 2024.
[25] Dmitri V. Kalashnikov, Zhaoqi Chen, Sharad Mehrotra, and Rabia Nuray-Turan. Web people search via connection analysis. IEEE Transactions on Knowledge and Data Engineering, 20(11):1550–1565, 2008. doi:10.1109/TKDE.2008.78.
[26] Sanchit Kalhan, Konstantin Makarychev, and Timothy Zhou. Correlation clustering with local objectives. Advances in Neural Information Processing Systems, 32, 2019.
[27] Michael Luby. A simple parallel algorithm for the maximal independent set problem. In Proceedings of the seventeenth annual ACM symposium on Theory of computing, pages 1–10, 1985. doi:10.1145/22145.22146.
[28] Xinghao Pan, Dimitris S. Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. Parallel correlation clustering on big graphs. In Advances in Neural Information Processing Systems (Neurips), pages 82–90, 2015. URL: https://proceedings.neurips.cc/paper/2015/hash/b53b3a3d6ab90ce0268229151c9bde11-Abstract.html.
[29] Gregory Puleo and Olgica Milenkovic. Correlation clustering and biclustering with locally bounded errors. In International Conference on Machine Learning, pages 869–877. PMLR, 2016. URL: http://proceedings.mlr.press/v48/puleo16.html.
[30] Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub Łkacki, and Vahab Mirrokni. Scalable community detection via parallel correlation clustering. arXiv preprint arXiv:2108.01731, 2021.
[31] Nate Veldt, David F. Gleich, and Anthony Wirth. A correlation clustering framework for community detection. In Proceedings of the 2018 ACM World Wide Web Conference (WWW), pages 439–448, 2018. doi:10.1145/3178876.3186110.

Appendix A Reduction from All Monotone Symmetric Norms to Top- $𝒌$ Norms

Definition 20 (Ordered Norms).

For any vector $x\in\mathbb{R}_{\geq 0}^{n}$ , let $x^{\downarrow}$ denote the vector $x$ with its coordinates sorted in non-increasing order. Given weight vector $w\in\mathbb{R}_{\geq 0}^{n}$ with $w_{1}\geq w_{2}\geq\cdots\geq w_{n}$ , the $w$ -ordered norm of $x$ is defined as $\mathrm{order}(w;x)=\sum_{i=1}^{n}w_{i}x_{i}^{\downarrow}$ .

Lemma 21 (Lemma 5.2 of [11]).

For any monotone and symmetric norm $f:\mathbb{R}^{n}\rightarrow\mathbb{R}_{+}$ , define the set $\mathbb{B}_{+}(f):=\{x\in\mathbb{R}_{+}^{n}:f(x)\leq 1\}$ , and $W=\{w\in\mathbb{R}_{+}^{n}:w_{1}\geq w_{2}\geq\cdots\geq w_{n},w\ is\ a\ % subgradient\ of\\ f\ at\ some\ x\in\mathbb{B}_{+}(f)\}$ . Then we have $f(x)=\max_{w\in W}\mathrm{order}(w;x)$ for every $x\in\mathbb{R}_{\geq 0}^{n}$ .

See 6

Proof of Lemma 6.

For any $w=(w_{1},w_{2},\dots,w_{n})$ such that $w_{1}\geq w_{2}\geq\cdots\geq w_{n}\geq 0$ , if we set $w^{\prime}=(w^{\prime}_{1},w^{\prime}_{2},\dots,w^{\prime}_{n})$ as

\displaystyle w_{i}^{\prime}=\left\{\begin{array}[]{rcl}w_{i}-w_{i+1}&&{i\in[1% ,n-1]}\\ w_{n}&&{i=n}\end{array}\right.

Let $\mathrm{top}_{k}(x)$ denote the top- $k$ norm of $x$ . Then we have $\mathrm{order}(w;x)=\sum_{k=1}^{n}w_{k}^{\prime}\cdot\mathrm{top}_{k}(x)$ .

Let $\mathbb{B}_{+}(f):=\{x\in\mathbb{R}_{+}^{n}:f(x)\leq 1\}$ and $W=\{w\in\mathbb{R}_{+}^{n}:w_{1}\geq w_{2}\geq\cdots\geq w_{n},w\ is\ a\ % subgradient\ of\\ f\ at\ some\ x\in\mathbb{B}_{+}(f)\}$ . We construct a new set $W^{\prime}=\{w^{\prime}|w\in W:w^{\prime}=(w^{\prime}_{1}=w_{1}-w_{2},w^{% \prime}_{2}=w_{2}-w_{3},\dots,w^{\prime}_{n-1}=w_{n-1}-w_{n},w^{\prime}_{n}=w_% {n})\}$ . By Lemma 21, we have $f(x)=\max_{w\in W}\mathrm{order}(w;x)=\max_{w^{\prime}\in W^{\prime}}\sum_{k=1% }^{n}w_{k}^{\prime}\cdot\mathrm{top}_{k}(x)$ .

Let $y$ be the disagreement vector for the given clustering ${\mathcal{C}}_{ALG}$ . For any symmetric monotone norm $f:\mathbb{R}_{\geq 0}^{n}\to\mathbb{R}_{\geq 0}$ , define $y^{*}_{f}$ to be the disagreement vector for the optimal clustering under the norm $f$ . By the assumption that ${\mathcal{C}}_{ALG}$ is a simultaneous $\rho$ -approximation for every top- $k$ norm, we have we have $\mathrm{top}_{k}(y)\leq\rho\cdot\mathrm{top}_{k}(y^{*}_{\text{top-k}})$ for every $k\in[n]$ , where $y^{*}_{\text{top-k}}$ is the disagreement vector for the optimal clustering under top-k objective. Now we bound $f(y)$ in terms of $f(y^{*}_{f})$ for any monotone symmetric norm $f$ :

	$\displaystyle f(y)$	$\displaystyle=\max_{w^{\prime}\in W^{\prime}}\sum_{k=1}^{n}w_{k}^{\prime}\cdot% \mathrm{top}_{k}(y)\leq\max_{w^{\prime}\in W^{\prime}}\sum_{k=1}^{n}w_{k}^{% \prime}\cdot\rho\cdot\mathrm{top}_{k}(y^{*}_{\text{top-k}})$
		$\displaystyle=\rho\cdot\max_{w^{\prime}\in W^{\prime}}\sum_{k=1}^{n}w_{k}^{% \prime}\cdot\mathrm{top}_{k}(y^{}_{\text{top-k}})\leq\rho\cdot\max_{w^{\prime% }\in W^{\prime}}\sum_{k=1}^{n}w_{k}^{\prime}\cdot\mathrm{top}_{k}(y^{}_{f})=% \rho\cdot f(y^{*}_{f}).\$

$\hfill\blacktriangleleft$

Appendix B Implementing Algorithm 1 in Nearly Linear Time

In this section, we show how to run Algorithm 1 approximately in nearly linear time. Indeed, the algorithm can be implemented in MPC model with $O(1)$ rounds. More precisely, we will show the following theorem:

Theorem 22.

Let $\epsilon>0$ and $\delta>0$ be small enough constants. Given a graph $G=(V,E)$ , there exists an MPC algorithm that computes a solution $\{\tilde{x}_{uv}\}_{u,v\in V}$ such that

1.

For any integer $k\in[n]$ , we have $\mathrm{cost}^{k}_{\tilde{x}}\leq(12.66+\epsilon){\mathrm{opt}}^{k}$ .
2.

For any $u,v,w\in V$ , we have $\tilde{x}_{uv}+\tilde{x}_{uw}+\epsilon\geq\tilde{x}_{vw}$ .

The algorithm succeeds with probability at least $1-1/n$ . Moreover, the algorithm runs in $O(1)$ rounds, has a total work of $\tilde{O}(m/\epsilon^{6})$ , requires $O(n^{\delta})$ memory per machine and a $\tilde{O}(m/\epsilon^{6})$ total memory.

We give the nearly linear time implementation of Algorithm 1 in Algorithm 2. Line 2 constructs the graph $H$ efficiently. Line 3-6 find the set $K$ of $-$ edges we want to assign LP value. For any $-$ edge $uv\not\in K$ , we will simply set its LP value as $1$ . Last, Line 7 to Line 14 is to set up $\tilde{x}_{uv}$ satisfying the conditions in Theorem 22. The full version of this paper will contain a complete analysis of Algorithm 2 and proof of Theorem 22.

Algorithm 2 Nearly Efficient Algorithm for Algorithm 1.

Appendix C Rounding

We will present a nearly linear time rounding algorithm. Furthermore, our algorithm only takes $\tilde{O}(1)$ rounds in the MPC model. The purpose of this section is to show

See 4

We emphasize that even if the LP values satisfy the exact triangle inequality, rather than an approximate triangle inequality, the $\epsilon$ terms in the approximate ratio will still be present. These $\epsilon$ terms arise from two sources: the approximate inequality itself and the inherent characteristics of our MPC algorithm.

Given Theorem 22 and Theorem 4, we are now able to show the main result of this paper.

Proof of Theorem 3.

We first run Algorithm 2, which outputs $\tilde{x}_{uv}$ as input to Algorithm 3. Note that by Theorem 22, we know that for any $k\in[1,n]$ , we have

\displaystyle\mathrm{cost}^{k}_{\tilde{x}}\leq(12.66+\epsilon){\mathrm{opt}}^{k}

where ${\mathrm{opt}}^{k}$ is the cost of the optimal correlation clustering solution when using the top- $k$ norm objective. By Theorem 4, we know that the final cluster ${\mathcal{C}}$ satisfies

\displaystyle\mathrm{cost}^{k}_{\mathcal{C}}\leq(5+55\epsilon)\mathrm{cost}^{k% }_{\tilde{x}}\leq(63.3+O(\epsilon)){\mathrm{opt}}^{k}.

By Lemma 6, we know that ${\mathcal{C}}$ is a simultaneous $(63.3+O(\epsilon))$ -approximate clustering for all monotone symmetric norms.

For the number of rounds, Algorithm 1 takes $O(1)$ rounds, and Algorithm 3 takes $O(\log^{3}n/\epsilon)$ rounds, resulting in a total of $O(\log^{3}n/\epsilon)$ rounds.

Algorithm 1 requires a total memory of $\tilde{O}(m/\epsilon^{6})$ and a total work of $\tilde{O}(m/\epsilon^{6})$ in the MPC model. By analyzing Algorithm 2, we know that $|K|=O(m\log n/\epsilon)$ . Therefore, Algorithm 3 requires a total memory of $\tilde{O}(m/\epsilon^{2})$ and a total work of $\tilde{O}(m/\epsilon)$ in the MPC model, as established by Theorem 4. In total, it requires a total memory of $\tilde{O}(m/\epsilon^{6})$ and a total work of $\tilde{O}(m/\epsilon^{6})$ in the MPC model. $\hfill\blacktriangleleft$

Rounding Algorithm

Assume that we are given an instance graph $G=(V,E)$ and a LP solution $(x_{uv})_{u,v\in V}$ such that $x_{uv}+x_{uw}+\epsilon\geq x_{vw}$ for any $u,v,w\in V^{3}$ . Given a subgraph $V_{t}$ and node $w$ , radius $r$ , define the ball centering at $w$ with radius $r$ as $\mathrm{Ball}_{V_{t}}(w,r)=\{u\mid u\in V_{t},x_{wu}\leq r\}$ . Define $L_{t}(w)=\sum_{u\in\mathrm{Ball}_{V_{t}}(w,r)}(r-x_{uw})$ . Note that $L_{t}(w)\geq r$ since $w$ itself is in $\mathrm{Ball}_{V_{t}}(w,r)$ .

Algorithm Description.

Our algorithm works as follows: in each round, we choose a set of cluster centers and choose the ball with radius $\frac{2}{5}$ as a cluster. More precisely. At step $t$ , $V_{t}$ is the set of unclustered vertices. We first compute $L_{t}^{\textmd{max}}$ to ensure that for each $u\in V_{t}$ , we have $L_{t}(u)<(1+\epsilon)L_{t}^{\textmd{max}}$ . For any node $u$ , if $L_{t}(u)\geq L_{t}^{\textmd{max}}$ , we will add $u$ to the set of candidate cluster centers $M_{t}$ (Line 6-10). Then, we compute cluster centers by adding vertices in the $M_{t}$ set to the $S_{t}$ set with probability $p_{t}$ , where the more vertices in $M_{t}$ are in $\mathrm{Ball}(radius=\frac{2}{5})$ with each other, the smaller the probability (Line 11-14). After that, to avoid conflicts, we remove some cluster centers in $S_{t}$ if they are too close to each other, and derive the final cluster center set $H_{t}$ (Line 15). Let $F_{t}=\mathrm{Ball}_{V_{t}}(H_{t},\frac{2}{5})$ be the nodes clustered at step $t$ . Then we add each $u\in F_{t}$ to the cluster from $H_{t}$ with minimum ID. We will remove the clustered nodes and repeat the above process until all vertices are clustered.

Algorithm 3 The Rounding Algorithm.

Complete analysis of Algorithm 3 and proof of Theorem 4 appear in the full version.

Appendix D A Constant Round MPC Algorithm

We will show Theorem 5 in this section. We repeat for convenience, See 5

Algorithm

Theorem 3 gives us an $O(\log^{3}n)$ rounds MPC algorithm. The bottleneck is the rounding procedure. To achieve a constant rounds MPC algorithm, instead of setting up the LP and rounding, we use the pre-clustering algorithm from [17], which is very useful for $\ell_{1}$ -norm correlation clustering. We show that the pre-clustering algorithm can also provide an $O(1)$ -approximate ratio for all monotone symmetric norms simultaneously.

Algorithm Description.

The algorithm from [17] is parameterized by $\beta$ and $\lambda$ . It has three steps:

1.

The first step is the same as the first step of Algorithm 1, where we compute the graph $H$ (Line 2).
2.

The algorithm marks a node as light if it loses more than a $\lambda$ fraction of its neighbors in the first step. Otherwise, it marks the node as heavy. The algorithm removes all edges between two light nodes in $H$ (Line 3 - Line 6).
3.

The last step is to output the connected components $F$ of the final graph.

Algorithm 4 Pre-clustering – Algorithm 1 in [17].

A detailed analysis of Algorithm 4 along with the formal proof of Theorem 5 is deferred to the full version.

[bib.bib1] [1] Rakesh Agrawal, Alan Halverson, Krishnaram Kenthapadi, Nina Mishra, and Panayiotis Tsaparas. Generating labels from clicks. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 172–181, 2009. doi:10.1145/1498759.1498824.

[bib.bib2] [2] Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. Journal of the ACM, 55(5):1–27, 2008. doi:10.1145/1411509.1411513.

[bib.bib3] [3] Sepehr Assadi and Chen Wang. Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions. In Proceedings of the 13th Conference on Innovations in Theoretical Computer Science (ITCS), volume 215 of LIPIcs, pages 10:1–10:20, 2022. doi:10.4230/LIPICS.ITCS.2022.10.

[bib.bib4] [4] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine learning, 56(1):89–113, 2004. doi:10.1023/B:MACH.0000033116.57574.95.

[bib.bib5] [5] Guy E Blelloch, Jeremy T Fineman, and Julian Shun. Greedy sequential maximal independent set and matching are parallel on average. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pages 308–317, 2012. doi:10.1145/2312005.2312058.

[bib.bib6] [6] Mélanie Cambus, Davin Choo, Havu Miikonen, and Jara Uitto. Massively parallel correlation clustering in bounded arboricity graphs. In 35th International Symposium on Distributed Computing (DISC), volume 209 of LIPIcs, pages 15:1–15:18, 2021. doi:10.4230/LIPICS.DISC.2021.15.

[bib.bib7] [7] Mélanie Cambus, Fabian Kuhn, Etna Lindy, Shreyas Pai, and Jara Uitto. A (3+ $\varepsilon$ )-approximate correlation clustering algorithm in dynamic streams. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2861–2880. SIAM, 2024. doi:10.1137/1.9781611977912.101.

[bib.bib8] [8] Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, Alantha Newman, and Lukas Vogl. Understanding the cluster linear program for correlation clustering. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 1605–1616, 2024. doi:10.1145/3618260.3649749.

[bib.bib9] [9] Nairen Cao, Shang-En Huang, and Hsin-Hao Su. Breaking 3-factor approximation for correlation clustering in polylogarithmic rounds. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 4124–4154. SIAM, 2024. doi:10.1137/1.9781611977912.143.

[bib.bib10] [10] Deepayan Chakrabarti, Ravi Kumar, and Kunal Punera. A graph-theoretic approach to webpage segmentation. In Proceedings of the 17th International conference on World Wide Web (WWW), pages 377–386, 2008. doi:10.1145/1367497.1367549.

[bib.bib11] [11] Deeparnab Chakrabarty and Chaitanya Swamy. Approximation algorithms for minimum norm and ordered optimization problems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 126–137, 2019. doi:10.1145/3313276.3316322.

[bib.bib12] [12] Moses Charikar, Neha Gupta, and Roy Schwartz. Local guarantees in graph cuts and clustering. In International Conference on Integer Programming and Combinatorial Optimization (IPCO), pages 136–147, 2017. doi:10.1007/978-3-319-59250-3_12.

[bib.bib13] [13] Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360–383, 2005. doi:10.1016/J.JCSS.2004.10.012.

[bib.bib14] [14] Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. Near optimal LP rounding algorithm for correlation clustering on complete and complete $k$ -partite graphs. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC), pages 219–228, 2015. doi:10.1145/2746539.2746604.

[bib.bib15] [15] Yudong Chen, Sujay Sanghavi, and Huan Xu. Clustering sparse graphs. In Advances in Neural Information Processing Systems (Neurips), pages 2204–2212, 2012.

[bib.bib16] [16] Flavio Chierichetti, Nilesh Dalvi, and Ravi Kumar. Correlation clustering in MapReduce. In Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 641–650, 2014. doi:10.1145/2623330.2623743.

[bib.bib17] [17] Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, and Jakub Tarnawski. Correlation clustering in constant many parallel rounds. In International Conference on Machine Learning, pages 2069–2078. PMLR, 2021. URL: http://proceedings.mlr.press/v139/cohen-addad21b.html.

[bib.bib18] [18] Vincent Cohen-Addad, Euiwoong Lee, Shi Li, and Alantha Newman. Handling correlated rounding error via preclustering: A 1.73-approximation for correlation clustering. In Proceedings of the 64rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2023.

[bib.bib19] [19] Vincent Cohen-Addad, Euiwoong Lee, and Alantha Newman. Correlation clustering with Sherali-Adams. In Proceedings of 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 651–661, 2022. doi:10.1109/FOCS54457.2022.00068.

[bib.bib20] [20] Sami Davies, Benjamin Moseley, and Heather Newman. Fast combinatorial algorithms for min max correlation clustering. arXiv preprint arXiv:2301.13079, 2023. doi:10.48550/arXiv.2301.13079.

[bib.bib21] [21] Sami Davies, Benjamin Moseley, and Heather Newman. Simultaneously approximating all lp-norms in correlation clustering. In 51st International Colloquium on Automata, Languages, and Programming, ICALP 2024, 2024.

[bib.bib22] [22] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008. doi:10.1145/1327452.1327492.

[bib.bib23] [23] Manuela Fischer and Andreas Noever. Tight analysis of parallel randomized greedy mis. ACM Transactions on Algorithms (TALG), 16(1):1–13, 2019. doi:10.1145/3326165.

[bib.bib24] [24] Holger SG Heidrich, Jannik Irmai, and Bjoern Andres. A 4-approximation algorithm for min max correlation clustering. In AISTATS, pages 1945–1953, 2024.

[bib.bib25] [25] Dmitri V. Kalashnikov, Zhaoqi Chen, Sharad Mehrotra, and Rabia Nuray-Turan. Web people search via connection analysis. IEEE Transactions on Knowledge and Data Engineering, 20(11):1550–1565, 2008. doi:10.1109/TKDE.2008.78.

[bib.bib26] [26] Sanchit Kalhan, Konstantin Makarychev, and Timothy Zhou. Correlation clustering with local objectives. Advances in Neural Information Processing Systems, 32, 2019.

[bib.bib27] [27] Michael Luby. A simple parallel algorithm for the maximal independent set problem. In Proceedings of the seventeenth annual ACM symposium on Theory of computing, pages 1–10, 1985. doi:10.1145/22145.22146.

[bib.bib28] [28] Xinghao Pan, Dimitris S. Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. Parallel correlation clustering on big graphs. In Advances in Neural Information Processing Systems (Neurips), pages 82–90, 2015. URL: https://proceedings.neurips.cc/paper/2015/hash/b53b3a3d6ab90ce0268229151c9bde11-Abstract.html.

[bib.bib29] [29] Gregory Puleo and Olgica Milenkovic. Correlation clustering and biclustering with locally bounded errors. In International Conference on Machine Learning, pages 869–877. PMLR, 2016. URL: http://proceedings.mlr.press/v48/puleo16.html.

[bib.bib30] [30] Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub Łkacki, and Vahab Mirrokni. Scalable community detection via parallel correlation clustering. arXiv preprint arXiv:2108.01731, 2021.

[bib.bib31] [31] Nate Veldt, David F. Gleich, and Anthony Wirth. A correlation clustering framework for community detection. In Proceedings of the 2018 ACM World Wide Web Conference (WWW), pages 439–448, 2018. doi:10.1145/3178876.3186110.

	$\displaystyle\|\varphi^{+}_{3}({\tilde{u}})\|-\|N_{H}(\tilde{u})\cap N_{H}(\tilde% {v})\|$	$\displaystyle\leq\|\phi^{+}_{3}({\tilde{u}})\|-\|\phi^{+}_{3}(\tilde{u})\cap N_{H% }(\tilde{v})\|$
		$\displaystyle=\|\phi^{+}_{3}(\tilde{u})\setminus N_{H}(\tilde{v})\|\leq\|\phi^{+}% _{2}(\tilde{v})\cup\phi^{-}_{1}(\tilde{v})\|$
		$\displaystyle=\|\varphi^{+}_{2}(\tilde{v})\cup\varphi^{-}_{1}(\tilde{v})\|\leq\|% \varphi^{+}_{2}(\tilde{v})\|+\mathrm{cost}^{-}_{\mathcal{C}}(\tilde{v}).$

	$\displaystyle\|c^{+}_{3}\|_{1}$	$\displaystyle=\sum_{u\in U,uv\in\varphi^{+}_{3},uw\in\varphi^{+}_{2}}\frac{2}{% \beta\cdot M_{uv}M_{uw}}+\sum_{u\in U,uv\in\varphi^{+}_{3},vw\in\varphi^{+}_{2% }}\frac{2}{\beta\cdot M_{uv}M_{vw}}+\sum_{u\in U,uv\in\varphi^{+}_{3}}\frac{2}% {M_{uv}}$
		$\displaystyle\leq\sum_{u\in U}\left(\frac{2\cdot\|\varphi^{+}_{2}(u)\|\cdot\|% \varphi^{+}_{3}(u)\|}{\beta\cdot d(u)\cdot d(u)}+\frac{2\cdot\|\varphi^{+}_{3}(u% )\|}{\beta\cdot d(u)}+\frac{2\cdot\|\varphi^{+}_{3}(u)\|}{d(u)}\right)$
		$\displaystyle=2\sum_{u\in U}\left(\frac{\|\varphi^{+}_{2}(u)\|\cdot\|\varphi^{+}_% {3}(u)\|}{\beta\cdot d^{2}(u)}+\frac{\|\varphi^{+}_{3}(u)\|}{\beta\cdot d(u)}+% \frac{\|\varphi^{+}_{3}(u)\|}{d(u)}\right).$

	$\displaystyle c(r)$	$\displaystyle\leq\frac{2\cdot\|\varphi^{+}_{2}(r)\|}{\beta\cdot d(r)}+2\left(% \frac{\|\varphi^{+}_{2}(r)\|\cdot\|\varphi^{+}_{3}(r)\|}{\beta\cdot d^{2}(r)}+% \frac{\|\varphi^{+}_{2}(r)\|}{\beta\cdot d(r)}+\frac{\|\varphi^{+}_{3}(r)\|}{d(r)}\right)$
		$\displaystyle=2\left(\frac{\|\varphi^{+}_{2}(r)\|\cdot\|\varphi^{+}_{3}(r)\|}{% \beta\cdot d^{2}(r)}+\frac{\|\varphi^{+}_{3}(r)\|}{d(r)}+\frac{2\cdot\|\varphi^{+% }_{2}(r)\|}{\beta\cdot d(r)}\right)$
		$\displaystyle\leq 2\left(\frac{\|\varphi^{+}_{3}(r)\|}{\beta\cdot d(r)}+\frac{\|% \varphi^{+}_{3}(r)\|}{\beta\cdot d(r)}+\frac{2\cdot\|\varphi^{+}_{2}(r)\|}{\beta% \cdot d(r)}\right)$
		$\displaystyle\leq\frac{4}{\beta},$

	$\displaystyle\|c\|_{1}$	$\displaystyle\leq\frac{2}{\beta}\sum_{u\in U}\frac{\|\varphi^{+}_{2}(u)\|}{d(u)}% +2\sum_{u\in U}\left(\frac{\|\varphi^{+}_{2}(u)\|\cdot\|\varphi^{+}_{3}(u)\|}{% \beta\cdot d^{2}(u)}+\frac{\|\varphi^{+}_{3}(u)\|}{\beta\cdot d(u)}+\frac{\|% \varphi^{+}_{3}(u)\|}{d(u)}\right)$
		$\displaystyle=2\sum_{u\in U}\left(\frac{\|\varphi^{+}_{2}(u)\|\cdot\|\varphi^{+}_% {3}(u)\|}{\beta\cdot d^{2}(u)}+\frac{\|\varphi^{+}_{3}(u)\|}{\beta\cdot d(u)}+% \frac{\|\varphi^{+}_{3}(u)\|}{d(u)}+\frac{\|\varphi^{+}_{2}(u)\|}{\beta\cdot d(u)}\right)$
		$\displaystyle\leq 2\sum_{u\in U}\left(\frac{\|\varphi^{+}_{2}(u)\|}{\beta\cdot d% (u)}+\frac{\|\varphi^{+}_{3}(u)\|}{\beta\cdot d(u)}+\frac{\|\varphi^{+}_{3}(u)\|}{% \beta\cdot d(u)}+\frac{\|\varphi^{+}_{2}(u)\|}{\beta\cdot d(u)}\right)$
		$\displaystyle\leq\frac{4}{\beta}\cdot k,$

Simultaneously Approximating All Norms for Massively Parallel Correlation Clustering

Abstract

Keywords and phrases:

Category:

Funding:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

1.1 Our results

Definition 1.

Theorem 2.

The MPC model.

Theorem 3.

Theorem 4.

Theorem 5.

1.2 Overview of Techniques

Polynomial Time Construction of Simultaneous 𝑶⁢(𝟏)-Approximation for All Symmetric Norms.

Implementation of Algorithm in Nearly-Linear Time and in MPC Model.

Organization.

2 Preliminaries

Lemma 6.

Theorem 7.

Theorem 8 (Chernoff Bound).

3 Simultaneous 𝑶⁢(𝟏)-Approximation for Top-𝒌 Norms

3.1 Algorithm

Lemma 9.

Proof of Theorem 2.

3.2 The validity of 𝒙 to LP (1)

Lemma 10.

Proof.

3.3 Bounding the Top-𝒌 Norm Cost of 𝒙

Notations.

Lemma 11.

Proof.

Lemma 12.

Proof.

Claim 13.

Proof.

Claim 14.

Proof.

Lemma 15.

Proof.

Lemma 16.

Proof.

Lemma 17.

Proof.

Lemma 18.

Proof.

Lemma 19.

Proof.

References

Appendix A Reduction from All Monotone Symmetric Norms to Top-𝒌 Norms

Definition 20 (Ordered Norms).

Lemma 21 (Lemma 5.2 of [11]).

Proof of Lemma 6.

Appendix B Implementing Algorithm 1 in Nearly Linear Time

Theorem 22.

Appendix C Rounding

Proof of Theorem 3.

Rounding Algorithm

Algorithm Description.

Appendix D A Constant Round MPC Algorithm

Algorithm

Algorithm Description.

Polynomial Time Construction of Simultaneous $O(1)$ -Approximation for All Symmetric Norms.

3 Simultaneous $O(1)$ -Approximation for Top- $𝒌$ Norms

3.2 The validity of $𝒙$ to LP (1)

3.3 Bounding the Top- $𝒌$ Norm Cost of $𝒙$

Appendix A Reduction from All Monotone Symmetric Norms to Top- $𝒌$ Norms