Approximation Algorithms for Continuous Clustering and Facility Location Problems
In this paper, we consider center-based clustering problems where C, the set of points to be clustered, lies in a metric space (X,d), and the set X of candidate centers is potentially infinite-sized. We call such problems continuous clustering problems to differentiate them from the discrete clustering problems where the set of candidate centers is explicitly given. It is known that for many objectives, when one restricts the set of centers to C itself and applies an α_dis-approximation algorithm for the discrete version, one obtains a β ⋅ α_{dis}-approximation algorithm for the continuous version via the triangle inequality property of the distance function. Here β depends on the objective, and for many objectives such as k-median, β = 2, while for some others such as k-means, β = 4. The motivating question in this paper is whether this gap of factor β between continuous and discrete problems is inherent, or can one design better algorithms for continuous clustering than simply reducing to the discrete case as mentioned above? In a recent SODA 2021 paper, Cohen-Addad, Karthik, and Lee prove a factor-2 and a factor-4 hardness, respectively, for the continuous versions of the k-median and k-means problems, even when the number of cluster centers is a constant. The discrete problem for a constant number of centers is easily solvable exactly using enumeration, and therefore, in certain regimes, the "β-factor loss" seems unavoidable.
In this paper, we describe a technique based on the round-or-cut framework to approach continuous clustering problems. We show that, for the continuous versions of some clustering problems, we can design approximation algorithms attaining a better factor than the β-factor blow-up mentioned above. In particular, we do so for: the uncapacitated facility location problem with uniform facility opening costs (λ-UFL); the k-means problem; the individually fair k-median problem; and the k-center with outliers problem. Notably, for λ-UFL, where β = 2 and the discrete version is NP-hard to approximate within a factor of 1.27, we describe a 2.32-approximation for the continuous version, and indeed 2.32 < 2 × 1.27. Also, for k-means, where β = 4 and the best known approximation factor for the discrete version is 9, we obtain a 32-approximation for the continuous version, which is better than 4 × 9 = 36.
The main challenge one faces is that most algorithms for the discrete clustering problems, including the state of the art solutions, depend on Linear Program (LP) relaxations that become infinite-sized in the continuous version. To overcome this, we design new linear program relaxations for the continuous clustering problems which, although having exponentially many constraints, are amenable to the round-or-cut framework.
Approximation Algorithms
Clustering
Facility Location
Fairness
Outliers
Theory of computation~Facility location and clustering
33:1-33:15
Regular Paper
The authors were supported by NSF award #2041920.
https://arxiv.org/abs/2206.15105
We thank the reviewers for their comments, which helped us improve our exposition and exhibit a stronger connection to existing work.
Deeparnab
Chakrabarty
Deeparnab Chakrabarty
Department of Computer Science, Dartmouth College, Hanover, NH, USA
https://www.cs.dartmouth.edu/~deepc/
Maryam
Negahbani
Maryam Negahbani
Department of Computer Science, Dartmouth College, Hanover, NH, USA
https://www.cs.dartmouth.edu/~maryam/
Ankita
Sarkar
Ankita Sarkar
Department of Computer Science, Dartmouth College, Hanover, NH, USA
https://ankita.host.dartmouth.edu
https://orcid.org/0000-0001-6787-7286
10.4230/LIPIcs.ESA.2022.33
Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 61-72, 2017.
Soroush Alamdari and David B. Shmoys. A bicriteria approximation algorithm for the k-center and k-median problems. In Proc., Workshop on Approximation and Online Algorithms (WAOA), pages 66-75. Springer, 2017.
Hyung-Chan An, Mohit Singh, and Ola Svensson. LP-based algorithms for capacitated facility location. SIAM Journal on Computing (SICOMP), 46(1):272-306, 2017. Preliminary version in FOCS 2014.
Georg Anegg, Haris Angelidakis, Adam Kurpisz, and Rico Zenklusen. A technique for obtaining true approximations for k-center with covering constraints. Math. Program., 192(1):3-27, 2022. URL: https://doi.org/10.1007/s10107-021-01645-y.
https://doi.org/10.1007/s10107-021-01645-y
V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local Search Heuristics for k-Median and Facility Location Problems. SIAM Journal on Computing (SICOMP), 33(3):544-562, 2004.
Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of euclidean k-means. CoRR, 2015. URL: http://arxiv.org/abs/1502.03316.
http://arxiv.org/abs/1502.03316
Tanvi Bajpai, Deeparnab Chakrabarty, Chandra Chekuri, and Maryam Negahbani. Revisiting Priority k-Center: Fairness and Outliers. In Proc., International Colloquium on Automata, Languages and Programming (ICALP), pages 21:1-21:20, 2021.
Jaroslaw Byrka, Fabrizio Grandoni, Thomas Rothvoss, and Laura Sanità. Steiner tree approximation via iterative randomized rounding. Journal of the ACM, 60(1):1-33, 2013. URL: https://doi.org/10.1145/2432622.2432628.
https://doi.org/10.1145/2432622.2432628
Jarosław Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 2017.
Deeparnab Chakrabarty, Chandra Chekuri, Sanjeev Khanna, and Nitish Korula. Approximability of capacitated network design. Algorithmica, 72(2):493-514, 2015.
Deeparnab Chakrabarty, Prachi Goyal, and Ravishankar Krishnaswamy. The non-uniform k-center problem. ACM Trans. Algorithms, 2020. Preliminary version in ICALP 2016.
Deeparnab Chakrabarty and Maryam Negahbani. Generalized center problems with outliers. ACM Trans. Algorithms, 2019.
Deeparnab Chakrabarty and Maryam Negahbani. Better algorithms for individually fair k-clustering. Adv. in Neu. Inf. Proc. Sys. (NeurIPS), 2021.
Deeparnab Chakrabarty, Maryam Negahbani, and Ankita Sarkar. Approximation algorithms for continuous clustering and facility location problems. CoRR, 2022. URL: http://arxiv.org/abs/2206.15105.
http://arxiv.org/abs/2206.15105
Moses. Charikar and Sudipto. Guha. Improved Combinatorial Algorithms for the Facility Location and k-Median Problems. In Proc., IEEE Symposium on Foundations of Computer Science (FOCS), 1999.
Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem. Journal of Computer and System Sciences, 65(1):129-149, 2002.
Moses Charikar, Samir Khuller, David M. Mount, and Giri Narasimhan. Algorithms for facility location problems with outliers. In Proc., ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 642-651, 2001.
Ke Chen. On k-median clustering in high dimensions. In Proc., ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1177-1185, 2006.
Vincent Cohen-Addad and Karthik C.S. Inapproximability of clustering in lp metrics. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 519-539, 2019. URL: https://doi.org/10.1109/FOCS.2019.00040.
https://doi.org/10.1109/FOCS.2019.00040
Vincent Cohen-Addad, Hossein Esfandiari, Vahab Mirrokni, and Shyam Narayanan. Improved approximations for Euclidean k-means and k-median, via nested quasi-independent sets. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, pages 1621-1628, New York, NY, USA, 2022. Association for Computing Machinery. URL: https://doi.org/10.1145/3519935.3520011.
https://doi.org/10.1145/3519935.3520011
Vincent Cohen-Addad, C. S. Karthik, and Euiwoong Lee. On approximability of clustering problems without candidate centers. In Proc., ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2635-2648, 2021.
Vincent Cohen-Addad, Philip N Klein, and Claire Mathieu. Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. SIAM Journal on Computing (SICOMP), 48(2):644-667, 2019.
Vincent Cohen-Addad, Karthik C. S., and Euiwoong Lee. Johnson coverage hypothesis: Inapproximability of k-means and k-median in 𝓁_p-metrics. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1493-1530, 2022. URL: https://doi.org/10.1137/1.9781611977073.63.
https://doi.org/10.1137/1.9781611977073.63
Vincent Cohen-Addad, David Saulpic, and Chris Schwiegelshohn. A new coreset framework for clustering. In Proc., ACM Symposium on Theory of Computing (STOC), pages 169-182, 2021.
W Fernandez De La Vega, Marek Karpinski, Claire Kenyon, and Yuval Rabani. Approximation schemes for clustering problems. In Proc., ACM Symposium on Theory of Computing (STOC), pages 50-58, 2003.
Hu Ding, Haikuo Yu, and Zixiu Wang. Greedy strategy works for k-center clustering with outliers and coreset construction. In Proc., European Symposium on Algorithms, pages 40:1-40:16, 2019.
Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proceedings of the 5th Annual Symposium on Computational Geometry (SoCG), pages 11-18, 2007.
Zachary Friggstad, Mohsen Rezapour, and Mohammad R Salavatipour. Local search yields a PTAS for k-means in doubling metrics. SIAM Journal on Computing (SICOMP), 48(2):452-480, 2019.
Martin Grötschel, László Lovász, and Alexander Schriver. Geometric Algorithms and Combinatorial Optimization. Springer, Berlin, Heidelberg, 1993.
Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. Journal of Algorithms, 31(1):228-248, 1999.
Anupam Gupta and Kanat Tangwongsan. Simpler analyses of local search algorithms for facility location. CoRR, 2008. URL: http://arxiv.org/abs/0809.2554.
http://arxiv.org/abs/0809.2554
S. Har-Peled and S. Mazumdar. Coresets for k-means and k-median clustering and their applications. In Proc., ACM Symposium on Theory of Computing (STOC), pages 291-300, 2004.
Dorit S. Hochbaum and David B. Shmoys. A best possible heuristic for the k-center problem. Math. Oper. Res., 10(2):180-184, 1985. URL: https://doi.org/10.1287/moor.10.2.180.
https://doi.org/10.1287/moor.10.2.180
Kamal Jain, Mohammad Mahdian, Evangelos Markakis, Amin Saberi, and Vijay V Vazirani. Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP. Journal of the ACM, 50(6):795-824, 2003.
Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangean relaxation. Journal of the ACM, 48(2):274-296, 2001.
Christopher Jung, Sampath Kannan, and Neil Lutz. Service in your neighborhood: Fairness in center location. In Proceedings, Foundations of Responsible Computing, FORC 2020, volume 156, pages 5:1-5:15, 2020.
Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth Silverman, and Angela Y Wu. A local search approximation algorithm for k-means clustering. Computational Geometry, 28(2-3):89-112, 2004.
A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In Proc., IEEE Symposium on Foundations of Computer Science (FOCS), pages 454-462, 2004.
Shi Li. A 1.488 approximation algorithm for the uncapacitated facility location problem. Information and Computation, 222:45-58, 2013.
Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. SIAM Journal on Computing (SICOMP), 45(2):530-547, 2016.
Sepideh Mahabadi and Ali Vakilian. (individual) fairness for k-clustering. In Proc., International Conference on Machine Learning (ICML), pages 7925-7935, 2020.
Jirı Matoušek. On approximate geometric k-clustering. Discrete & Computational Geometry, 24(1):61-84, 2000.
Ján Plesník. A heuristic for the p-center problems in graphs. Discrete Applied Mathematics, 17(3):263-268, 1987.
Robert D. Carr, Toshihiro Fujito, Goran Konjevod, and Ojas Parekh. A 21/10-approximation Algorithm for a Generalization of the Weighted Edge-Dominating Set Problem. J. Comb. Optim., 2001. Preliminary version appeared in Proc. 8th ESA, pages 132-142, 2000.
David B. Shmoys, Éva Tardos, and Karen Aardal. Approximation algorithms for facility location problems (extended abstract). In Proc., ACM Symposium on Theory of Computing (STOC), 1997.
Luca Trevisan. When hamming meets euclid: The approximability of geometric tsp and steiner tree. SIAM Journal on Computing, 30(2):475-485, 2000. URL: https://doi.org/10.1137/S0097539799352735.
https://doi.org/10.1137/S0097539799352735
Ali Vakilian and Mustafa Yalciner. Improved approximation algorithms for individually fair clustering. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 8758-8779. PMLR, 28-30 March 2022. URL: https://proceedings.mlr.press/v151/vakilian22a.html.
https://proceedings.mlr.press/v151/vakilian22a.html
Deeparnab Chakrabarty, Maryam Negahbani, and Ankita Sarkar
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode