Polynomial-Time Constant-Approximation for Fair Sum-Of-Radii Clustering
Abstract
In a seminal work, Chierichetti et al. [20] introduced the -fair clustering problem: Given a set of red points and a set of blue points in a metric space, a clustering is called fair if the number of red points in each cluster is at most times and at least times the number of blue points in that cluster. The goal is to compute a fair clustering with at most clusters that optimizes certain objective function. Considering this problem, they designed a polynomial-time - and -approximation for the -center and the -median objective, respectively. Recently, Carta et al. [15] studied this problem with the sum-of-radii objective and obtained a -approximation with running time , i.e., fixed-parameter tractable in . Here is the input size. In this work, we design the first polynomial-time -approximation for -fair clustering with the sum-of-radii objective, improving the result of Carta et al. Our result places sum-of-radii in the same group of objectives as -center, that admit polynomial-time -approximations. This result also implies a polynomial-time -approximation for the Euclidean version of the problem, for which an -time -approximation was known due to Drexler et al. [24]. Here is an exponential function of . We are also able to extend our result to any arbitrary number of colors when . This matches known results for the -center and -median objectives in this case. The significant disparity of sum-of-radii compared to -center and -median presents several complex challenges, all of which we successfully overcome in our work. Our main contribution is a novel cluster-merging-based analysis technique for sum-of-radii that helps us achieve the constant-approximation bounds.
Keywords and phrases:
fair clustering, sum-of-radii clustering, approximation algorithmsCopyright and License:
2012 ACM Subject Classification:
Theory of computation Approximation algorithms analysisFunding:
This work was supported by the National Science Foundation under Grant No. AF 2311397.Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Given a set of points in a metric space and an integer , the task of clustering is to find a partition of into groups or clusters such that each group has similar points. The similarity of the clusters is typically modeled using an objective function which is to be minimized. In this work, we focus on the sum-of-radii objective, which is defined as the sum of the radii of balls that contain the points of the respective clusters. The sum-of-radii objective, while also center-based, has a different flavor from objectives such as -center, -median, and -means, as it directly sums the radii of the clusters rather than measuring distances from each point to its assigned center. In these objectives, representative points (or cluster centers) are chosen, and the corresponding clusters are formed by assigning the points of to their nearest centers. Such a partition is popularly known as the Voronoi partition. It is not hard to see that an optimal sum-of-radii clustering is not necessarily a Voronoi partition. The study of sum-of-radii was motivated by the idea that it could reduce the so-called dissection effect that is observed in -center type objectives (see attached full version for details).
Sum-of-radii clustering is known to be NP-hard even in planar metrics and metrics of constant doubling dimension [30]. Consequently, it has received substantial attention from the approximation algorithms community. Charikar and Panigrahy [16] designed a Primal-Dual and Lagrangian-relaxation-based 3.504-approximation algorithm that runs in polynomial time (poly-time). Recently, using similar techniques, Friggstad and Jamshidian [26] improved the approximation factor to 3.389. The best-known approximation factor for sum-of-radii in polynomial time is for any , due to Buchem et al. [14]. In stark contrast to other well-studied center-based objectives such as -center and -median, the sum-of-radii objective admits a QPTAS [30], which is based on a randomized metric partitioning scheme. Additionally, the problem can be solved exactly in polynomial time in the Euclidean metric of constant dimension [31]. The problem also admits polynomial time exact algorithms in other restricted settings, such as when singleton clusters are not allowed [9] and the metric is unweighted [33].
In recent years, sum-of-radii clustering has also been studied with additional constraints. One such popular constraint is the capacity constraint, which puts restriction on the number of points that each cluster can contain. In a series of articles [35, 8, 36, 25], -approximation algorithms have been designed for capacitated sum-of-radii with running time fixed-parameter tractable (FPT) in (i.e., for a function of ), culminating in an approximation factor of 3. Inamdar and Varadarajan [35] studied sum-of-radii with a matroid constraint where the set of centers of the balls must be an independent set of a matroid. They obtain an FPT 9-approximation for this problem. The approximation factor has recently been improved to 3 by Chen et al. [18]. Obtaining a poly-time -approximation for any of these constrained versions is an interesting open question. However, poly-time -approximations are known for sum-of-radii with lower bounds and with outliers [3, 14].
Sum-of-radii has also been studied with fairness constraints, which is the main focus of our work. Clustering with fairness constraints or fair clustering stems from the idea that protected groups (defined based on a sensitive feature, e.g., gender) must be well-represented in each cluster. In recent years, fair clustering has received significant attention from researchers across several areas of computer science. In a seminal work, Chierichetti et al. [20] introduced the -fair clustering problem. In this problem, we are given a set of red points, a set of blue points, that together contain points, and an integer balance parameter . A clustering is called -fair if, for any cluster , the number of red points in is at least times and at most times the number of blue points in . We say that each cluster in a -fair clustering is -balanced.
Chierichetti et al. studied -fair clustering with -center and -median objectives, and obtained poly-time 4- and -approximation, respectively. Since then obtaining a poly-time -approximation for -fair median or means remained an intriguing open question. The main challenge in this case is that the optimal clusterings are no longer Voronoi partitions, as they also need to be -fair. Subsequently, -fair median/means has been studied in a plethora of works. The only setting where it is known to obtain a poly-time -approximation is when [12], that is for -fair median/means. These problems have also been considered in the Euclidean case [46, 5].
The -fair median/means problem has also been studied with an arbitrary number of groups. The algorithm of Böhm et al. [12] for also yields a poly-time -approximation in this case. Note that for , a cluster contains the same number of points from all groups. Bandyapadhyay et al. [6] obtained a poly-time approximation for -fair median with a factor that depends on , , and . Bercea et al. [11] and Bera et al. [10] independently defined a generalization of -fair clustering. There we are given balance parameters for each group . A clustering is called fair representational if the fraction of points from group in every cluster is at least and at most for all . They show that it is possible to obtain poly-time bi-criteria type -approximations where we are allowed to violate the fairness constraints by an additive small constant factor. Subsequently, Dai et al. [23] designed a DP-based poly-time -approximation for this problem. For groups, their running time is .
Carta et al. [15] studied fair versions of sum-of-radii. In particular, they study a more general class of mergeable constraints. A clustering constraint is called mergeable if the union of two clusters satisfying the constraint also satisfies the constraint. They show that the fairness constraints defined in -fair clustering and fair representational clustering are mergeable. In their work, they obtained a -approximation for sum-of-radii with mergeable constraints. In particular, for the above two fairness constraints, their run time is , so FPT in . The algorithm iteratively guesses the next cluster based on a -center completion problem leading to the FPT run time. Their approximation factor improves to when . Drexler et al. [24] obtained an FPT -approximation for Euclidean sum-of-radii with mergeable constraints. Chen et al. [18] studied a fair version, which is a special case of matroid sum-of-radii, and hence obtained an FPT 3-approximation. A summary of the results on fair clustering under various objectives is provided in Table 1.
As mentioned before, for fair representational models, only bi-criteria type -approximations are known for -center/median/means, even with two groups. As we focus on our theoretical quest of designing poly-time -approximations fully satisfying the fairness constraints, we study -fair sum-of-radii. In light of the above discussion, we state the following two questions.
- Question :
-
Does -fair sum-of-radii (with two groups) admit a poly-time constant-approximation algorithm?
- Question :
-
Does -fair sum-of-radii with an arbitrary number of groups admit a poly-time constant-approximation algorithm?
After the work of Chierichetti et al. [20], several other notions of fairness have been considered in the context of clustering problems. The following is a sample of these works grouped by the fairness notions: individual fairness [38, 43, 47, 13, 1], proportional fairness [19, 42], fair center representation [17, 40, 39, 21, 34], colorful [7, 37, 4], and min-max fairness [2, 28, 41, 22, 29, 32].
| Objective | Fairness Type | Approximation | Time | Reference |
| -Center | (2 groups) | 4 | Poly | [20] |
| -Median | (2 groups) | Poly | [20] | |
| -Median | ( groups) | Poly | [12] | |
| -Median | ( groups) | Poly | [6] | |
| -Median / Center | Representational (bi-criteria) | Poly | [10, 11] | |
| Sum-of-Radii | Unconstrained | Poly | [14] | |
| Sum-of-Radii | Capacitated | 3 | FPT | [25] |
| Sum-of-Radii | Matroid constraint | 3 | FPT | [18] |
| Sum-of-Radii | (2 groups) | FPT | [15] | |
| Sum-of-Radii | ( groups) | FPT | [15] | |
| Sum-of-Radii | (2 groups) | Poly | This work | |
| Sum-of-Radii | ( groups) | Poly | This work |
1.1 Our Contributions and Techniques
In our work, we prove two theorems resolving Questions 1 and 2 in the affirmative. First, we prove the following theorem.
Theorem 1.
There is a polynomial-time -approximation algorithm for -fair sum-of-radii (with two groups).
Our result complements the FPT approximation result of Carta et al. [15] by achieving the first -approximation for the problem in polynomial time. The result also implies a poly-time -approximation for Euclidean -fair sum-of-radii, for which only an FPT -approximation was known [24]. We note that our result should also be compared with that of -fair -median for which only -approximation is known in polynomial time. In particular, our result places sum-of-radii in the same group of objectives as -center that admits polynomial-time -approximations. Moreover, our result shows that -fair sum-of-radii is in contrast to most of the constrained versions of sum-of-radii, including capacitated clustering, for which only FPT -approximations are known.
Next, we give an overview of our approach. Our approximation algorithm is motivated by the algorithms for -fair center and -fair median [20]. These algorithms have two major steps. In the first step, a fairlet decomposition of the points in is computed, i.e., a partition such that for each fairlet , it either has 1 red point and at most blue points or 1 blue point and at most red points. Let be the function that maps each point to the index of the fairlet that contains . From each , an arbitrary point is designated as its representative. In the second step, a clustering of these representatives is computed with the respective cost function. Also, for each , all of its points are assigned to the cluster that contains . The new clustering is obviously -fair, as each cluster is a merger of fairlets. For the analysis of the cost of the computed clustering, they define a fairlet decomposition cost, which is used to bound the assignment cost of the points in the second step. For -center, this cost is , and for -median, it is . Indeed, both of these costs when optimal are comparable to the optimal -fair clustering cost. For -center, it is within a constant factor, and for -median it is within an factor. Then, it is sufficient to compute a fairlet decomposition in the first step whose cost is within a small constant-factor of the optimal fairlet decomposition cost.
Coming back to -fair sum-of-radii, it is not clear how to define a suitable fairlet decomposition cost that can be compared to the optimal -fair sum-of-radii cost. In particular, such a cost needs to be defined independent of the number of clusters . However, for sum-of-radii, the objective is the sum of radii of clusters. For example, a natural candidate, the cost for -median, i.e., , is likely to be much larger than the optimal sum-of-radii cost. In the absence of such a suitable fairlet decomposition cost, it is difficult to argue the increase in the assignment cost, when actual points of are assigned instead of just the representative .
Our approach.
Our algorithm is surprisingly simple to state. We first compute a complete bi-partite graph with and being the two parts. The weight of each edge is set to be the distance between the two corresponding endpoints. Subsequently, a degree-constrained, spanning subgraph of this graph is computed where each vertex has a degree in range , and the sum of the weights of the edges is minimized. Such an optimal subgraph can be computed in polynomial time using the algorithm of Gabow [27]. Moreover, one can show that such a subgraph is a collection of stars each having at most edges. Thus, our algorithm up to this point is in a similar spirit to that of -median. As we argued before, the total weight of such a subgraph can be very large compared to the optimal sum-of-radii cost. Our main contribution is to prove that there is a sum-of-radii clustering of the stars (or representatives of them) computed in this way whose cost is at most a constant times the optimal -fair sum-of-radii cost. Then, one can compute an approximate sum-of-radii clustering of these stars and return the corresponding clustering of the points in . The obtained clustering is -fair, as the clusters are disjoint union of the vertices of stars, each having at most edges. The proof of the existence of a clustering of the computed stars whose cost is nicely bounded is based on a novel analysis technique that merges a set of optimal clusters to obtain superclusters. We give an overview in the following.
Let be the degree-constrained subgraph computed with the minimum weight possible. Also, let be a fixed optimal -fair sum-of-radii clustering. We repetitively merge pairs of these clusters if there are edges in across them. Let be the resulting clustering. By our construction, each star of is fully contained in one of these merged clusters or superclusters. Thus, it is sufficient to show that the radius of each supercluster is at most times the sum of the radii of the associated optimal clusters whose merger is . To bound such radius, we introduce a notion of minimum-switch paths between pairs of clusters. These paths play a central role in our analysis. We prove that it is possible to bound the (weighted) length of any such path by times the sum of the radii of the associated optimal clusters. Then the diameter (or radius) of the supercluster can also be bounded likewise, as any two cluster vertices are connected by a minimum-switch path. The important distinction is that the length of any arbitrary path might not be bounded in such a nice way.
Next, we prove the following theorem concerning Question 2.
Theorem 2.
There is a polynomial-time -approximation algorithm for -fair sum-of-radii with groups of points.
Again our result directly improves the FPT approximation result of Carta et al. [15] and extends to more than 2 groups. The result matches the known constant-approximation bound for -center/median/means in this case. The proof of the above theorem is similar to the proof of Theorem 1, and so employs the same supercluster-based analysis framework. However, here we need to handle colors. The main challenge boils down to bounding the diameter of a certain multi-partite graph with being the set of vertices. Intuitively, by the analysis for two groups, the diameter of the graphs induced by only is nicely bounded. However, we still need to bound the diameter of . Consequently, we introduce an additional notion of minimum-color-switch paths. We prove that the lengths of these paths can also be bounded nicely, exploiting their special properties.
Organization.
We introduce notation in Section 2 and present our algorithms for -fair and -fair sum-of-radii in Sections 3 and 4, respectively. Proofs of statements marked by are available in the full version111https://arxiv.org/pdf/2504.14683.
2 Preliminaries
In sum-of-radii clustering, we are given a set of points in a metric space with distance and an integer . We would like to find: (i) a subset of containing points and a non-negative integer (called the radius) for each , and (ii) a function assigning each point to a center such that . The subset for each is called the cluster corresponding to having radius . The goal is to find a clustering that minimizes the sum of the radii .
In -fair sum-of-radii clustering, we are given two disjoint groups (red) and (blue) having points in total in a metric space and an integer balance parameter . A clustering is called -fair if, for each cluster , the number of points from in is at least times the number of points from in and at most times the number of points from in . The goal is to compute a -fair clustering minimizing the sum of the radii of the clusters. Each cluster in a -fair clustering is called -balanced.
In Balanced sum-of-radii clustering, we are given disjoint groups having points in total in a metric space such that . A clustering is called balanced if, for each cluster , it holds that . The goal is to compute a balanced clustering that minimizes the sum of the radii of the clusters. We say that each cluster in a balanced clustering is -balanced.
Consider any metric space and a subset . For any cluster and a point , . The center of in is the point, . The radius of w.r.t. and , denoted by , is the distance between and its center in , i.e., . We refer to the sum of the radii, w.r.t. and , of the clusters in any clustering as the cost of w.r.t. and and denote it by cost.
We note that the term “-fairness” refers specifically to the two-color case, where each cluster must maintain a red-to-blue ratio within . This notion does not naturally extend to more than two colors. In contrast, in the multi-color setting with groups, we adopt the term “balanced clustering” (or “-fair clustering with groups”) to describe the setting where each cluster must contain an equal number of points from each group. While we use similar notation for consistency, these two notions are structurally different and should be interpreted accordingly.
3 The Algorithm for -Fair Sum-of-Radii Clustering
In this section, we prove Theorem 1. To set up the stage, we define the following problem.
Min-cost Degree Constrained Subgraph (Min-cost DCS).
A Degree Constrained Subgraph (DCS) of a graph is a subgraph such that the degree of each vertex in is in the range for given integers and . Suppose we are also given a weight function . A min-cost DCS of is a DCS that minimizes the sum of the weights of the edges in over all DCS.
Proposition 3 ([27]).
Min-cost DCS can be solved in time.
The proposition follows from the work of Gabow (Theorem 5.2) [27]. There the stated time complexity is , which is , as each upper-bound can be assumed to be at most the degree of the -th vertex. One technicality is that they study the maximization version (with real weights), but the minimization version can be solved by the standard method of negating edge-weights in min-cost DCS. Also see [45] that has similar discussions and an time algorithm for min-cost DCS, which they call minimum-cost many-to-many matching with demands and capacities.
Observation 4 ().
A min-cost DCS with for all does not contain a path of length three, and thus it is a disjoint union of star graphs.
Our algorithm is as follows.
The Algorithm
-
1.
Construct a graph where and . Define the weight function such that for each edge , . Compute a min-cost DCS of with and for all .
-
2.
Construct an edge-weighted graph in the following way: For each , add a vertex to ; For each star in , add a vertex corresponding to to , which we also call by ; For each , add the edge to with weight ; For all and in , add the edge to with weight . Let be the shortest path metric in . Construct the metric space where is the subset of vertices in corresponding to the stars in .
-
3.
Compute a sum of radii clustering of the points in using the Algorithm of Buchem et al. [14] (with also being the candidate set of centers).
-
4.
Compute a clustering of the points in using in the following way. For each cluster , add the cluster to . Return .
Next, we analyze the algorithm. First, we have the following observations.
Observation 5 ().
is a -fair clustering of .
Next, we analyze the approximation factor. Let be a fixed optimal -fair clustering. We will prove the following lemma. Our result follows as a corollary.
Lemma 6.
Consider the clustering of constructed in Step 3 of the algorithm. Then cost.
Corollary 7 ().
Consider the clustering of constructed in Step 4 of the algorithm. Then cost. Thus, our algorithm is a -approximation algorithm.
3.1 Proof of Lemma 6
In the following, we are going to prove Lemma 6. Consider the min-cost DCS computed in Step 1. Also, consider the optimal clusters in . We construct a new clustering by merging clusters in in the following way, where . Initially, we set to . For each edge of such that and , replace in by their union and denote it by as well.
When the above merging procedure ends, by renaming the indexes, let be the new clustering. Then, we have the following observation.
Observation 8.
Consider any star in . Then, for some , all the points of are contained in .
Consider the clustering of defined in the following way. For each star in , identify the cluster in that contains all the points in . By Observation 8, such an index exists. Assign the point in corresponding to to .
Lemma 9 ().
cost cost.
We will prove the following lemma.
Lemma 10.
cost.
3.2 Proof of Lemma 10
For simplicity of notation, we drop from , as henceforth centers are always assumed to be in and the metric to be . Let us consider any fixed , and suppose it is constructed by merging the clusters . It is sufficient to prove that . For simplicity of notation, we rename by , and by .
Let be the induced subgraph of such that the vertices of are in . We refer to a point of (resp. ) as a red (resp. blue) point. Note that the edges of are across red and blue points. In the following, we construct an edge-weighted, directed multi-graph in the following manner. has a vertex corresponding to each cluster , where . There is an edge from to for each and such that is in . We refer to such an edge as a -edge, i.e., its parity is 0. The weight of the edge is . Similarly, there is a -edge (or parity 1 edge) from to for each and such that is in . The weight of the edge is . For each edge , we denote the corresponding edge in by , where is the red point and is the blue point. For simplicity of exposition, we are going to make heavy use of this correspondence.
Observation 11 ().
Suppose there is a -edge (resp. -edge) in . Then there is also a -edge (resp. -edge) in .
A directed path (or simply a path) from to in is a sequence of distinct vertices such that is in for all . We say that contains the edges . If contains all 0-edges (resp. 1-edges), it is called a 0-path (resp. 1-path). Two consecutive edges on are said to form a switch if they have different parity. We say that the switch happens at and it is the corresponding switching vertex. The switch is called a -switch if the parity of is for . A directed cycle is formed from by adding the edge (if any) with it. The reverse path of is the path that contains the edges for all . Such edges exist according to Observation 11. A 0-path (resp. 1-path) in a subgraph of starting at and ending at is called maximal if does not have any outgoing 0-edges (resp. 1-edges) in the subgraph.
Observation 12 ().
For any two vertices , there is a directed path from to in .
Consider any two vertices and of . Let be a directed path from to having the minimum number of switches, i.e., a minimum-switch path from to .
We prove the following lemma.
Lemma 13.
. Moreover, if does not have a switch, .
Before proving this lemma, we show how to prove Lemma 10. Consider any point in . Let . Now, consider any point in that is the farthest point from . Let . By Lemma 13 it follows that, there is a path, say , from to whose sum of the edge weights is at most . Then,
Summing over all clusters in , we obtain Lemma 10.
3.3 Proof of Lemma 13
The overall idea is to show the existence of a subset of edges , such that the set of edges form a valid degree-constrained subgraph of on the set of vertices . Additionally, we need that the total weight of the edges of is small. Then we can show that the weight of is also small, as is a min-cost DCS of . However, it might not be possible to remove only the edges of from to show the existence of such a set . We show that there is a subset that contains the edges of and can be removed to obtain such a valid degree-constrained subgraph. In the following, we prove that obtaining two such sets and is sufficient to prove Lemma 13. For a set of edges , let .
Lemma 14.
Suppose there are , such that the set of edges forms a valid degree-constrained subgraph of and . Then, .
Proof.
Note that is a min-cost DCS of . Consider the graph induced by the set of edges . By our assumption, is a valid DCS of . It follows that,
The last inequality follows from our assumption.
Assuming that the conditions of the above lemma are true, we finish the proof of Lemma 13.
If does not have a switch, then we will show that . Hence, the moreover part in Lemma 13 also follows. It is left to show the existence of such and .
3.4 Construction of and
Two subgraphs and of are called 0-1-edge-disjoint if for any edge of and of , the corresponding edges in are distinct. Thus, if contains a 0-edge , and and are 0-1-edge-disjoint, then cannot contain the 0-edge and the 1-edge . Similarly, if contains a 1-edge , and and are 0-1-edge-disjoint, then cannot contain the 1-edge and the 0-edge . Let be the indexes of the vertices on where the switches occur. Note that . Denote the switch that occurs at by for all (i.e., is the parity of ). Let be the parity of and be the parity of .
First, we consider the simple case when the parity of is 0 (resp. 1) and there is a 0-path (resp. 1-path) from to in that is 0-1-edge-disjoint from . Let us denote the latter path by . Note that is a path having the minimum number of switches and the existence of ensures that does not have a switch. Let be the subset of edges that lie on the paths in . Next, we define a subset that has a one-to-one mapping with . In particular, consider any edge in . Note that if it is a -edge, it was added due to an edge in such that and . We add the edge to . Otherwise, if is a -edge, it was added due to an edge in such that and . In this case, we add the edge to .
Next, we show the construction of . Wlog, let us assume that is a 0-path. The other case is symmetric. Note that then is also a 0-path as per our assumption. First, we describe the process of adding the replacement edges for the path . Consider any intermediate vertex (if any) on this path. Then, there are exactly two points in corresponding to the edges on , which are of opposite colors. We add an edge between these two points in (see Figure 1). Removal of the edges of corresponding to and the addition of this edge do not change the degree of the two points in . Similarly, we add edges to corresponding to the intermediate vertices of . Next, consider the vertex . There is an incoming 0-edge on and an outgoing 0-edge on that are incident on . Thus, there are exactly two points in of opposite colors corresponding to these two edges. We add an edge between these two points in . Removal of the edges of corresponding to those two edges, and the addition of this edge does not change the degree of the two points in . Similarly, consider the vertex . There is an outgoing 0-edge on and an incoming 0-edge on that are incident on . Thus, there are exactly two points in of opposite colors corresponding to these two edges. We add an edge between these two points in . Again, the removal of the edges of corresponding to those two edges, and the addition of this edge does not change the degree of the two points in . See Figure 1 for an illustration.
By our construction, the set of edges form a valid degree-constrained subgraph of on the set of vertices . The way we add the edges to , both endpoints of each edge lie in a cluster such that the vertex corresponding to the cluster lies on a path in . Now, can lie either on one such path or on two paths. Thus, we add at most two edges to corresponding to . The sum of the weights of these two edges is at most 2 times the diameter of . Hence, by Lemma 14, we obtain
Next, we consider the remaining case when the parity of is 0 (resp. 1) and there is no 0-path (resp. 1-path) from to in that is 0-1-edge-disjoint from . Thus, there is no -path from to in that is 0-1-edge-disjoint from .
Consider a path and let denote its start vertex. Also, consider a cycle such that and have exactly one vertex in common. Note that might not have an edge, in which case . Let be the graph formed by the union of and , i.e., by gluing them together at . We refer to such a graph as a hanging cycle for with being the join vertex. is called a -hanging cycle if all the edges of and are -edges. Let be the point in the cluster corresponding to the edge of the cycle incoming to . Additionally, is called special if the degree of in is at least 2, and is called the special point of (see Figure 2).
In the current case, we need the following lemmas.
Lemma 15 ().
Suppose the parity of is 0 (resp. 1), and there is no 0-path (resp. 1-path) from to in that is 0-1-edge-disjoint from . Moreover, suppose there is no special 0-hanging cycle (resp. 1-hanging cycle) in for that is 0-1-edge-disjoint from . Then, there exists a -path (resp. 1-path) in from to a vertex , such that is 0-1-edge-disjoint from and one of the following is true: (i) the degree of (resp. ) in is at least 2, where is the last edge on if it has an edge or otherwise; or (ii) has a red (resp. blue) point whose degree in is at most .
Lemma 16 ().
Suppose the parity of is 1 (resp. 0), and there is no 1-path (resp. 0-path) from to in that is 0-1-edge-disjoint from . Moreover, suppose there is no special 0-hanging cycle (resp. 1-hanging cycle) in for that is 0-1-edge-disjoint from . Then, there exists a -path (resp. 1-path) from to a vertex , such that is 0-1-edge-disjoint from and one of the following is true: (i) the degree of (resp. ) in is at least 2, where is the last edge on if it has an edge or otherwise, and (ii) has a red (resp. blue) point whose degree in is at most .
Next, we apply the above lemmas to show the construction of and . Recall that in this case there is no -path from to in that is 0-1-edge-disjoint from . If has a switch, then there is no 0-path or 1-path from to in that is 0-1-edge-disjoint from . Otherwise, it must be that , and hence by our assumption, there is no -path from to in that is 0-1-edge-disjoint from . We conclude that in this case (), there is no -path from to in that is 0-1-edge-disjoint from . Then, by Lemma 16 it follows that, either there is a -hanging cycle for in 0-1-edge-disjoint from , or a -path starting from in with special properties. This is true, as the parity of is . We denote this structure by . Additionally, if is a path, we call (resp. ) an anchor point if its degree in is at least 2. Similarly, by Lemma 15 it follows that, either there is a -hanging cycle for in 0-1-edge-disjoint from , or a -path starting from in with special properties. This is true, as is a -edge. We denote this structure by . Additionally, if is a path, we call (resp. ) an anchor point if its degree in is at least 2.
Now, if , then and must be vertex-disjoint. If they are not vertex-disjoint, there exists a -path from to in that is 0-1-edge-disjoint from : take the edges on from to a common vertex and the reverse of , from the common vertex to . These reverse edges have parity opposite of , i.e., the same as . But, by our assumption, such a -path does not exist. Hence, and are vertex-disjoint.
In the other case, . We note that if has no switch, . Thus, if , then we can safely assume that has at least one switch. In this case, suppose there are a -hanging cycle for and a -hanging cycle for in , such that both are 0-1-edge-disjoint, each of the hanging cycles is 0-1-edge-disjoint from , and either the special vertices of both are distinct or the special points are the same and the degree of that point in is at least 3. Then, we take the hanging cycle for as and the one for as . Otherwise, if there is a -hanging cycle for in 0-1-edge-disjoint from or a -hanging cycle for in 0-1-edge-disjoint from , we consider one of those. Assume that the former holds. The other case is symmetric. We take such a hanging cycle for as . Then, one can prove that there is a -path from in with special properties (Lemma 8 in the full version). We take this -path as . Otherwise, there is neither a -hanging cycle for in 0-1-edge-disjoint from nor a -hanging cycle for in 0-1-edge-disjoint from . Then, one can prove that there are two 0-1-edge-disjoint paths with parity , from and , respectively, such that both are also 0-1-edge-disjoint from (Lemma 9 in the full version). In this case, we take the path from as and the path from as .
For all , if there are two 0-1-edge-disjoint -hanging cycles for in that are 0-1-edge-disjoint from and have distinct special points or the same special point of degree at least 3 in , denote them by and . Otherwise, if there is one -hanging cycle for in that is 0-1-edge-disjoint from , denote it by . Now, is a -edge and is a -edge. Then, one can prove that there is a -path starting from in with special properties (Lemma 8 in the full version). Denote this path by . Otherwise, there is no -hanging cycle for in that is 0-1-edge-disjoint from . In this case, one can prove that there are two -paths starting from in with special properties. Denote them by and . Note that in all the cases, , or can either be a path or a hanging cycle.
Construction of .
Let be the subset of edges that lie on the structures in . Next, we define the subset of that has a one-to-one mapping with . In particular, consider any edge in . Note that if it is a -edge, it was added due to an edge in such that and . We add the edge to . Otherwise, if is a -edge, it was added due to an edge in such that and . We again add the edge to .
Our proof is completed by the following two lemmas.
Lemma 17 ().
There is a subset of edges , such that the set of edges form a valid degree-constrained subgraph of on the set of vertices .
4 The Algorithm for Balanced Sum-of-Radii Clustering
In this section, we prove Theorem 2. Recall that we are given disjoint groups having points in total in a metric space , such that .
Our algorithm is as follows.
The Algorithm
-
1.
For each , construct a graph where and . Define the weight function such that for each edge , . Compute a minimum-weight (w.r.t. ) perfect matching of . For each , let be the union of and the points from that are matched to in .
-
2.
Construct an edge-weighted graph in the following way: For each , add a vertex to ; For each , add a vertex corresponding to to , which we also call by ; For each , add the edge to with weight ; For all and , add the edge to with weight . Let be the shortest path metric in . Construct the metric space where is the subset of vertices in .
-
3.
Compute a sum of radii clustering of the points in using the Algorithm of Buchem et al. [14] (with also being the candidate set of centers).
-
4.
Compute a clustering of the points in using in the following way. For each cluster , add the cluster to . Return .
Let be a fixed optimal balanced clustering.
We have the following lemma. Our main result follows as a corollary.
Lemma 18 ().
Consider the clustering of constructed in Step 3 of the algorithm. Then cost.
Corollary 19 ().
Consider the clustering of constructed in Step 3 of the algorithm. Then cost. Thus, our algorithm is a -approximation algorithm.
References
- [1] Anders Aamand, Justin Y. Chen, Allen Liu, Sandeep Silwal, Pattara Sukprasert, Ali Vakilian, and Fred Zhang. Constant approximation for individual preference stable clustering. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- [2] Mohsen Abbasi, Aditya Bhaskara, and Suresh Venkatasubramanian. Fair clustering via equitable group representations. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT), pages 504–514, 2021. doi:10.1145/3442188.3445913.
- [3] Sara Ahmadian and Chaitanya Swamy. Approximation Algorithms for Clustering Problems with Lower Bounds and Outliers. In Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi, editors, 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), volume 55 of Leibniz International Proceedings in Informatics (LIPIcs), pages 69:1–69:15, Dagstuhl, Germany, 2016. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ICALP.2016.69.
- [4] Georg Anegg, Haris Angelidakis, Adam Kurpisz, and Rico Zenklusen. A technique for obtaining true approximations for -center with covering constraints. In International conference on integer programming and combinatorial optimization, pages 52–65. Springer, 2020. doi:10.1007/978-3-030-45771-6_5.
- [5] Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner. Scalable fair clustering. In International Conference on Machine Learning, pages 405–413, 2019. URL: http://proceedings.mlr.press/v97/backurs19a.html.
- [6] Sayan Bandyapadhyay, Eden Chlamtáč, Yury Makarychev, and Ali Vakilian. A polynomial-time approximation for pairwise fair -median clustering. arXiv preprint arXiv:2405.10378, 2024. doi:10.48550/arXiv.2405.10378.
- [7] Sayan Bandyapadhyay, Tanmay Inamdar, Shreyas Pai, and Kasturi Varadarajan. A constant approximation for colorful -center. In 27th Annual European Symposium on Algorithms (ESA 2019). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019.
- [8] Sayan Bandyapadhyay, William Lochet, and Saket Saurabh. FPT constant-approximations for capacitated clustering to minimize the sum of cluster radii. In Erin W. Chambers and Joachim Gudmundsson, editors, 39th International Symposium on Computational Geometry, SoCG 2023, June 12-15, 2023, Dallas, Texas, USA, volume 258 of LIPIcs, pages 12:1–12:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.SOCG.2023.12.
- [9] Babak Behsaz and Mohammad R. Salavatipour. On minimum sum of radii and diameters clustering. Algorithmica, 73(1):143–165, 2015. doi:10.1007/s00453-014-9907-3.
- [10] Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. Fair algorithms for clustering. In Advances in Neural Information Processing Systems, pages 4954–4965, 2019.
- [11] Ioana O Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R Schmidt, and Melanie Schmidt. On the cost of essentially fair clusterings. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPIcs.APPROX-RANDOM.2019.18.
- [12] Matteo Böhm, Adriano Fazzone, Stefano Leonardi, and Chris Schwiegelshohn. Fair clustering with multiple colors. arXiv preprint arXiv:2002.07892, 2020. arXiv:2002.07892.
- [13] B Brubach, D Chakrabarti, J Dickerson, A Srinivasan, and L Tsepenekas. Fairness, semi-supervised learning, and more: A general framework for clustering with stochastic pairwise constraints. In Proc. Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), 2021.
- [14] Moritz Buchem, Katja Ettmayr, Hugo KK Rosado, and Andreas Wiese. A ()-approximation algorithm for the minimum sum of radii problem with outliers and extensions for generalized lower bounds. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1738–1765. SIAM, 2024.
- [15] Lena Carta, Lukas Drexler, Annika Hennes, Clemens Rösner, and Melanie Schmidt. FPT Approximations for Fair -Min-Sum-Radii. In Julián Mestre and Anthony Wirth, editors, 35th International Symposium on Algorithms and Computation (ISAAC 2024), volume 322 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1–16:18, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ISAAC.2024.16.
- [16] Moses Charikar and Rina Panigrahy. Clustering to minimize the sum of cluster diameters. J. Comput. Syst. Sci., 68(2):417–441, 2004. doi:10.1016/j.jcss.2003.07.014.
- [17] Danny Z Chen, Jian Li, Hongyu Liang, and Haitao Wang. Matroid and knapsack center problems. Algorithmica, 75(1):27–52, 2016. doi:10.1007/S00453-015-0010-1.
- [18] Xianrun Chen, Dachuan Xu, Yicheng Xu, and Yong Zhang. Parameterized approximation algorithms for sum of radii clustering and variants. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20666–20673, 2024. doi:10.1609/AAAI.V38I18.30053.
- [19] Xingyu Chen, Brandon Fain, Liang Lyu, and Kamesh Munagala. Proportionally fair clustering. In International Conference on Machine Learning, pages 1032–1041, 2019. URL: http://proceedings.mlr.press/v97/chen19d.html.
- [20] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In Advances in Neural Information Processing Systems, pages 5029–5037, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/978fce5bcc4eccc88ad48ce3914124a2-Abstract.html.
- [21] Ashish Chiplunkar, Sagar Kale, and Sivaramakrishnan Natarajan Ramamoorthy. How to solve fair -center in massive data models. In Proceedings of the International Conference on Machine Learning (ICML), pages 1877–1886, 2020.
- [22] Eden Chlamtáč, Yury Makarychev, and Ali Vakilian. Approximating fair clustering with cascaded norm objectives. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2664–2683, 2022. doi:10.1137/1.9781611977073.104.
- [23] Zhen Dai, Yury Makarychev, and Ali Vakilian. Fair representation clustering with several protected classes. In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022, pages 814–823. ACM, 2022. doi:10.1145/3531146.3533146.
- [24] Lukas Drexler, Annika Hennes, Abhiruk Lahiri, Melanie Schmidt, and Julian Wargalla. Approximating fair -min-sum-radii in euclidean space. In International Workshop on Approximation and Online Algorithms, pages 119–133. Springer, 2023. doi:10.1007/978-3-031-49815-2_9.
- [25] Arnold Filtser and Ameet Gadekar. Fpt approximations for capacitated sum of radii and diameters. arXiv preprint arXiv:2409.04984, 2024. doi:10.48550/arXiv.2409.04984.
- [26] Zachary Friggstad and Mahya Jamshidian. Improved Polynomial-Time Approximations for Clustering with Minimum Sum of Radii or Diameters. In Shiri Chechik, Gonzalo Navarro, Eva Rotenberg, and Grzegorz Herman, editors, 30th Annual European Symposium on Algorithms (ESA 2022), volume 244 of Leibniz International Proceedings in Informatics (LIPIcs), pages 56:1–56:14, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ESA.2022.56.
- [27] Harold N Gabow. An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems. In Proceedings of the fifteenth annual ACM symposium on Theory of computing, pages 448–456, 1983. doi:10.1145/800061.808776.
- [28] Mehrdad Ghadiri, Samira Samadi, and Santosh S. Vempala. Socially fair -means clustering. In Madeleine Clare Elish, William Isaac, and Richard S. Zemel, editors, FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, pages 438–448. ACM, 2021. doi:10.1145/3442188.3445906.
- [29] Mehrdad Ghadiri, Mohit Singh, and Santosh S Vempala. Constant-factor approximation algorithms for socially fair -clustering. arXiv preprint arXiv:2206.11210, 2022. doi:10.48550/arXiv.2206.11210.
- [30] Matt Gibson, Gaurav Kanade, Erik Krohn, Imran A. Pirwani, and Kasturi R. Varadarajan. On metric clustering to minimize the sum of radii. Algorithmica, 57(3):484–498, 2010. doi:10.1007/s00453-009-9282-7.
- [31] Matt Gibson, Gaurav Kanade, Erik Krohn, Imran A. Pirwani, and Kasturi R. Varadarajan. On clustering to minimize the sum of radii. SIAM J. Comput., 41(1):47–60, 2012. doi:10.1137/100798144.
- [32] Swati Gupta, Jai Moondra, and Mohit Singh. Which norm is the fairest? approximations for fair facility location across all “”, 2022. arXiv:2211.14873.
- [33] Pinar Heggernes and Daniel Lokshtanov. Optimal broadcast domination in polynomial time. Discret. Math., 306(24):3267–3280, 2006. doi:10.1016/j.disc.2006.06.013.
- [34] Sedjro Salomon Hotegni, Sepideh Mahabadi, and Ali Vakilian. Approximation algorithms for fair range clustering. In International Conference on Machine Learning, pages 13270–13284. PMLR, 2023.
- [35] Tanmay Inamdar and Kasturi R. Varadarajan. Capacitated sum-of-radii clustering: An FPT approximation. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms, ESA 2020, September 7-9, 2020, Pisa, Italy (Virtual Conference), volume 173 of LIPIcs, pages 62:1–62:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPIcs.ESA.2020.62.
- [36] Ragesh Jaiswal, Amit Kumar, and Jatin Yadav. FPT approximation for capacitated sum of radii. In Venkatesan Guruswami, editor, 15th Innovations in Theoretical Computer Science Conference, ITCS 2024, January 30 to February 2, 2024, Berkeley, CA, USA, volume 287 of LIPIcs, pages 65:1–65:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.ITCS.2024.65.
- [37] Xinrui Jia, Kshiteej Sheth, and Ola Svensson. Fair colorful -center clustering. In International Conference on Integer Programming and Combinatorial Optimization, pages 209–222. Springer, 2020. doi:10.1007/978-3-030-45771-6_17.
- [38] Christopher Jung, Sampath Kannan, and Neil Lutz. A center in your neighborhood: Fairness in facility location. In Proceedings of the Symposium on Foundations of Responsible Computing (FORC), pages 5:1–5:15, 2020.
- [39] Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. Fair -center clustering for data summarization. In 36th International Conference on Machine Learning, ICML 2019, pages 5984–6003. International Machine Learning Society (IMLS), 2019.
- [40] Ravishankar Krishnaswamy, Shi Li, and Sai Sandeep. Constant approximation for k-median and k-means with outliers via iterative rounding. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 646–659, 2018. doi:10.1145/3188745.3188882.
- [41] Yury Makarychev and Ali Vakilian. Approximation algorithms for socially fair clustering. In Conference on Learning Theory (COLT), pages 3246–3264. PMLR, 2021. URL: http://proceedings.mlr.press/v134/makarychev21a.html.
- [42] Evi Micha and Nisarg Shah. Proportionally fair clustering revisited. In International Colloquium on Automata, Languages, and Programming (ICALP), 2020. doi:10.4230/LIPIcs.ICALP.2020.85.
- [43] Maryam Negahbani and Deeparnab Chakrabarty. Better algorithms for individually fair -clustering. Advances in Neural Information Processing Systems (NeurIPS), 34:13340–13351, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/6f221fcb5c504fe96789df252123770b-Abstract.html.
- [44] Sina Bagheri Nezhad, Sayan Bandyapadhyay, and Tianzhi Chen. Polynomial-time constant-approximation for fair sum-of-radii clustering, 2025. doi:10.48550/arXiv.2504.14683.
- [45] Fatemeh Rajabi-Alni and Alireza Bagheri. Computing a many-to-many matching with demands and capacities between two sets using the hungarian algorithm. Journal of mathematics, 2023(1):7761902, 2023.
- [46] Melanie Schmidt, Chris Schwiegelshohn, and Christian Sohler. Fair coresets and streaming algorithms for fair k-means. In International Workshop on Approximation and Online Algorithms, pages 232–251. Springer, 2019. doi:10.1007/978-3-030-39479-0_16.
- [47] Ali Vakilian and Mustafa Yalçıner. Improved approximation algorithms for individually fair clustering. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 8758–8779. PMLR, 2022.
