eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2022-09-01
33:1
33:15
10.4230/LIPIcs.ESA.2022.33
article
Approximation Algorithms for Continuous Clustering and Facility Location Problems
Chakrabarty, Deeparnab
1
Negahbani, Maryam
1
Sarkar, Ankita
1
https://orcid.org/0000-0001-6787-7286
Department of Computer Science, Dartmouth College, Hanover, NH, USA
In this paper, we consider center-based clustering problems where C, the set of points to be clustered, lies in a metric space (X,d), and the set X of candidate centers is potentially infinite-sized. We call such problems continuous clustering problems to differentiate them from the discrete clustering problems where the set of candidate centers is explicitly given. It is known that for many objectives, when one restricts the set of centers to C itself and applies an α_dis-approximation algorithm for the discrete version, one obtains a β ⋅ α_{dis}-approximation algorithm for the continuous version via the triangle inequality property of the distance function. Here β depends on the objective, and for many objectives such as k-median, β = 2, while for some others such as k-means, β = 4. The motivating question in this paper is whether this gap of factor β between continuous and discrete problems is inherent, or can one design better algorithms for continuous clustering than simply reducing to the discrete case as mentioned above? In a recent SODA 2021 paper, Cohen-Addad, Karthik, and Lee prove a factor-2 and a factor-4 hardness, respectively, for the continuous versions of the k-median and k-means problems, even when the number of cluster centers is a constant. The discrete problem for a constant number of centers is easily solvable exactly using enumeration, and therefore, in certain regimes, the "β-factor loss" seems unavoidable.
In this paper, we describe a technique based on the round-or-cut framework to approach continuous clustering problems. We show that, for the continuous versions of some clustering problems, we can design approximation algorithms attaining a better factor than the β-factor blow-up mentioned above. In particular, we do so for: the uncapacitated facility location problem with uniform facility opening costs (λ-UFL); the k-means problem; the individually fair k-median problem; and the k-center with outliers problem. Notably, for λ-UFL, where β = 2 and the discrete version is NP-hard to approximate within a factor of 1.27, we describe a 2.32-approximation for the continuous version, and indeed 2.32 < 2 × 1.27. Also, for k-means, where β = 4 and the best known approximation factor for the discrete version is 9, we obtain a 32-approximation for the continuous version, which is better than 4 × 9 = 36.
The main challenge one faces is that most algorithms for the discrete clustering problems, including the state of the art solutions, depend on Linear Program (LP) relaxations that become infinite-sized in the continuous version. To overcome this, we design new linear program relaxations for the continuous clustering problems which, although having exponentially many constraints, are amenable to the round-or-cut framework.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol244-esa2022/LIPIcs.ESA.2022.33/LIPIcs.ESA.2022.33.pdf
Approximation Algorithms
Clustering
Facility Location
Fairness
Outliers