Universal Algorithms for Clustering Problems

Ganesh, Arun; Maggs, Bruce M.; Panigrahi, Debmalya

doi:10.4230/LIPIcs.ICALP.2021.70

Abstract

This paper presents universal algorithms for clustering problems, including the widely studied k-median, k-means, and k-center objectives. The input is a metric space containing all potential client locations. The algorithm must select k cluster centers such that they are a good solution for any subset of clients that actually realize. Specifically, we aim for low regret, defined as the maximum over all subsets of the difference between the cost of the algorithm’s solution and that of an optimal solution. A universal algorithm’s solution sol for a clustering problem is said to be an (α, β)-approximation if for all subsets of clients C', it satisfies sol(C') ≤ α ⋅ opt(C') + β ⋅ mr, where opt(C') is the cost of the optimal solution for clients C' and mr is the minimum regret achievable by any solution. Our main results are universal algorithms for the standard clustering objectives of k-median, k-means, and k-center that achieve (O(1), O(1))-approximations. These results are obtained via a novel framework for universal algorithms using linear programming (LP) relaxations. These results generalize to other 𝓁_p-objectives and the setting where some subset of the clients are fixed. We also give hardness results showing that (α, β)-approximation is NP-hard if α or β is at most a certain constant, even for the widely studied special case of Euclidean metric spaces. This shows that in some sense, (O(1), O(1))-approximation is the strongest type of guarantee obtainable for universal clustering.

Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In Proceedings of the 58th Annual IEEE Symposium on Foundations of Computing, pages 61-72, October 2017. URL: https://doi.org/10.1109/FOCS.2017.15.
N. Alon and Y. Azar. On-line steiner trees in the Euclidean plane. In Proceedings of the 8th Annual Symposium on Computational Geometry, pages 337-343, 1992.
Aaron Archer, Ranjithkumar Rajagopalan, and David B. Shmoys. Lagrangian relaxation for the k-median problem: New insights and continuity properties. In Giuseppe Di Battista and Uri Zwick, editors, Algorithms - ESA 2003: 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003. Proceedings, pages 31-42. Springer Berlin Heidelberg, Berlin, Heidelberg, 2003. URL: https://doi.org/10.1007/978-3-540-39658-1_6.
Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation schemes for Euclidean k-medians and related problems. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC '98, pages 106-113, New York, NY, USA, 1998. ACM. URL: https://doi.org/10.1145/276698.276718.
Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristic for k-median and facility location problems. In Proceedings of the Thirty-third Annual ACM Symposium on Theory of Computing, STOC '01, pages 21-29, New York, NY, USA, 2001. ACM. URL: https://doi.org/10.1145/380752.380755.
I. Averbakh and Oded Berman. Minimax regret p-center location on a network with demand uncertainty. Location Science, 5(4):247-254, 1997. URL: https://doi.org/10.1016/S0966-8349(98)00033-3.
Igor Averbakh and Oded Berman. Minmax regret median location on a network under uncertainty. INFORMS Journal on Computing, 12(2):104-110, 2000. URL: https://doi.org/10.1287/ijoc.12.2.104.11897.
D. Bertsimas and M. Grigni. Worst-case examples for the spacefilling curve heuristic for the Euclidean traveling salesman problem. Operations Research Letter, 8(5):241-244, October 1989.
Anand Bhalgat, Deeparnab Chakrabarty, and Sanjeev Khanna. Optimal lower bounds for universal and differentially private Steiner trees and TSPs. In Leslie Ann Goldberg, Klaus Jansen, R. Ravi, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 75-86, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.
Costas Busch, Chinmoy Dutta, Jaikumar Radhakrishnan, Rajmohan Rajaraman, and Srinivasagopalan Srivathsan. Split and join: Strong partitions and universal Steiner trees for graphs. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012, pages 81-90, 2012.
Jaroslaw Byrka, Fabrizio Grandoni, Thomas Rothvoß, and Laura Sanità. Steiner tree approximation via iterative randomized rounding. J. ACM, 60(1):6:1-6:33, 2013.
Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput., 40(6):1740-1766, 2011. URL: https://doi.org/10.1137/080733991.
Deeparnab Chakrabarty and Chaitanya Swamy. Approximation algorithms for minimum norm and ordered optimization problems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 126–137, New York, NY, USA, 2019. Association for Computing Machinery. URL: https://doi.org/10.1145/3313276.3316322.
Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem (extended abstract). In Proceedings of the Thirty-first Annual ACM Symposium on Theory of Computing, STOC '99, pages 1-10, New York, NY, USA, 1999. ACM. URL: https://doi.org/10.1145/301250.301257.
Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci., 38:293-306, 1985.
Igor Gorodezky, Robert D. Kleinberg, David B. Shmoys, and Gwen Spencer. Improved lower bounds for the universal and a priori TSP. In Maria Serna, Ronen Shaltiel, Klaus Jansen, and José Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 178-191, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
F. Grandoni, A. Gupta, S. Leonardi, P. Miettinen, P. Sankowski, and M. Singh. Set covering with our eyes closed. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science, October 2008.
Martin Grötschel, László Lovász, and Alexander Schrijver. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica, 1(2):169-197, 1981.
Sudipto Guha and Kamesh Munagala. Exceeding expectations and clustering uncertain data. In Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’09, page 269–278, New York, NY, USA, 2009. Association for Computing Machinery. URL: https://doi.org/10.1145/1559795.1559836.
Anupam Gupta, Mohammad T. Hajiaghayi, and Harald Räcke. Oblivious network design. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA '06, pages 970-979, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=1109557.1109665.
Anupam Gupta and Kanat Tangwongsan. Simpler analyses of local search algorithms for facility location. ArXiv, abs/0809.2554, 2008. URL: http://arxiv.org/abs/0809.2554.
Mohammad T. Hajiaghayi, Robert Kleinberg, and Tom Leighton. Improved lower and upper bounds for universal TSP in planar metrics. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 649-658, 2006.
Dorit S. Hochbaum and David B. Shmoys. A best possible heuristic for the k-center problem. Math. Oper. Res., 10(2):180-184, May 1985. URL: https://doi.org/10.1287/moor.10.2.180.
Dorit S. Hochbaum and David B. Shmoys. A unified approach to approximation algorithms for bottleneck problems. J. ACM, 33(3):533-550, May 1986. URL: https://doi.org/10.1145/5925.5933.
Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM, 48(2):274-296, 2001. URL: https://doi.org/10.1145/375827.375845.
L. Jia, G. Lin, G. Noubir, R. Rajaraman, and R. Sundaram. Universal algorithms for TSP, Steiner tree, and set cover. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, 2005.
Lujun Jia, Guevara Noubir, Rajmohan Rajaraman, and Ravi Sundaram. GIST: Group-independent spanning tree for data aggregation in dense sensor networks. In Phillip B. Gibbons, Tarek Abdelzaher, James Aspnes, and Ramesh Rao, editors, Distributed Computing in Sensor Systems, pages 282-304, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. In Proceedings of the Eighteenth Annual Symposium on Computational Geometry, SCG '02, pages 10-18, New York, NY, USA, 2002. ACM. URL: https://doi.org/10.1145/513400.513402.
Adam Kasperski and Pawel Zielinski. On the existence of an FPTAS for minmax regret combinatorial optimization problems with interval data. Oper. Res. Lett., 35:525-532, 2007.
Samir. Khuller and Yoram J. Sussmann. The capacitated k-center problem. SIAM Journal on Discrete Mathematics, 13(3):403-418, 2000. URL: https://doi.org/10.1137/S0895480197329776.
Stavros G. Kolliopoulos and Satish Rao. A nearly linear-time approximation scheme for the Euclidean k-median problem. In Jaroslav Nešetřil, editor, Algorithms - ESA' 99, pages 378-389, Berlin, Heidelberg, 1999. Springer Berlin Heidelberg.
Panos Kouvelis and Gang Yu. Robust 1-Median Location Problems: Dynamic Aspects and Uncertainty, pages 193-240. Springer US, Boston, MA, 1997. URL: https://doi.org/10.1007/978-1-4757-2620-6_6.
Amit Kumar, Yogish Sabharwal, and Sandeep Sen. A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In Proceedings of the 45th IEEE Symposium on Foundations of Computer Science, pages 454-462, 2004.
Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. In Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, pages 901-910, 2013.
Stuart P. Lloyd. Least squares quantization in pcm. IEEE Trans. Information Theory, 28:129-136, 1982.
Stuart G. Mentzer. Approximability of metric clustering problems. Unpublished manuscript, March 2016.
Viswanath Nagarajan, Baruch Schieber, and Hadas Shachnai. The Euclidean k-supplier problem. In Michel Goemans and José Correa, editors, Integer Programming and Combinatorial Optimization, pages 290-301, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions - i. Mathematical Programming, 14(1):265-294, 1978. URL: https://doi.org/10.1007/BF01588971.
Loren K. Platzman and John J. Bartholdi, III. Spacefilling curves and the planar travelling salesman problem. J. ACM, 36(4):719-737, 1989. URL: https://doi.org/10.1145/76359.76361.
Frans Schalekamp and David B. Shmoys. Algorithms for the universal and a priori TSP. Operations Research Letters, 36(1):1-3, 2008. URL: https://doi.org/10.1016/j.orl.2007.04.009.

Universal Algorithms for Clustering Problems

Authors Arun Ganesh, Bruce M. Maggs, Debmalya Panigrahi

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Universal Algorithms for Clustering Problems

Authors Arun Ganesh, Bruce M. Maggs, Debmalya Panigrahi

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message