Abstract
In this paper, we consider centerbased clustering problems where C, the set of points to be clustered, lies in a metric space (X,d), and the set X of candidate centers is potentially infinitesized. We call such problems continuous clustering problems to differentiate them from the discrete clustering problems where the set of candidate centers is explicitly given. It is known that for many objectives, when one restricts the set of centers to C itself and applies an α_disapproximation algorithm for the discrete version, one obtains a β ⋅ α_{dis}approximation algorithm for the continuous version via the triangle inequality property of the distance function. Here β depends on the objective, and for many objectives such as kmedian, β = 2, while for some others such as kmeans, β = 4. The motivating question in this paper is whether this gap of factor β between continuous and discrete problems is inherent, or can one design better algorithms for continuous clustering than simply reducing to the discrete case as mentioned above? In a recent SODA 2021 paper, CohenAddad, Karthik, and Lee prove a factor2 and a factor4 hardness, respectively, for the continuous versions of the kmedian and kmeans problems, even when the number of cluster centers is a constant. The discrete problem for a constant number of centers is easily solvable exactly using enumeration, and therefore, in certain regimes, the "βfactor loss" seems unavoidable.
In this paper, we describe a technique based on the roundorcut framework to approach continuous clustering problems. We show that, for the continuous versions of some clustering problems, we can design approximation algorithms attaining a better factor than the βfactor blowup mentioned above. In particular, we do so for: the uncapacitated facility location problem with uniform facility opening costs (λUFL); the kmeans problem; the individually fair kmedian problem; and the kcenter with outliers problem. Notably, for λUFL, where β = 2 and the discrete version is NPhard to approximate within a factor of 1.27, we describe a 2.32approximation for the continuous version, and indeed 2.32 < 2 × 1.27. Also, for kmeans, where β = 4 and the best known approximation factor for the discrete version is 9, we obtain a 32approximation for the continuous version, which is better than 4 × 9 = 36.
The main challenge one faces is that most algorithms for the discrete clustering problems, including the state of the art solutions, depend on Linear Program (LP) relaxations that become infinitesized in the continuous version. To overcome this, we design new linear program relaxations for the continuous clustering problems which, although having exponentially many constraints, are amenable to the roundorcut framework.
BibTeX  Entry
@InProceedings{chakrabarty_et_al:LIPIcs.ESA.2022.33,
author = {Chakrabarty, Deeparnab and Negahbani, Maryam and Sarkar, Ankita},
title = {{Approximation Algorithms for Continuous Clustering and Facility Location Problems}},
booktitle = {30th Annual European Symposium on Algorithms (ESA 2022)},
pages = {33:133:15},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772471},
ISSN = {18688969},
year = {2022},
volume = {244},
editor = {Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/16971},
URN = {urn:nbn:de:0030drops169710},
doi = {10.4230/LIPIcs.ESA.2022.33},
annote = {Keywords: Approximation Algorithms, Clustering, Facility Location, Fairness, Outliers}
}
Keywords: 

Approximation Algorithms, Clustering, Facility Location, Fairness, Outliers 
Collection: 

30th Annual European Symposium on Algorithms (ESA 2022) 
Issue Date: 

2022 
Date of publication: 

01.09.2022 