Exact Exponential Algorithms for Clustering Problems

Authors Fedor V. Fomin , Petr A. Golovach , Tanmay Inamdar , Nidhi Purohit, Saket Saurabh



PDF
Thumbnail PDF

File

LIPIcs.IPEC.2022.13.pdf
  • Filesize: 0.88 MB
  • 14 pages

Document Identifiers

Author Details

Fedor V. Fomin
  • Department of Informatics, University of Bergen, Norway
Petr A. Golovach
  • Department of Informatics, University of Bergen, Norway
Tanmay Inamdar
  • Department of Informatics, University of Bergen, Norway
Nidhi Purohit
  • Department of Informatics, University of Bergen, Norway
Saket Saurabh
  • The Institute of Mathematical Sciences, HBNI, Chennai, India
  • Department of Informatics, University of Bergen, Norway

Cite AsGet BibTex

Fedor V. Fomin, Petr A. Golovach, Tanmay Inamdar, Nidhi Purohit, and Saket Saurabh. Exact Exponential Algorithms for Clustering Problems. In 17th International Symposium on Parameterized and Exact Computation (IPEC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 249, pp. 13:1-13:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.IPEC.2022.13

Abstract

In this paper we initiate a systematic study of exact algorithms for some of the well known clustering problems, namely k-MEDIAN and k-MEANS. In k-MEDIAN, the input consists of a set X of n points belonging to a metric space, and the task is to select a subset C ⊆ X of k points as centers, such that the sum of the distances of every point to its nearest center is minimized. In k-MEANS, the objective is to minimize the sum of squares of the distances instead. It is easy to design an algorithm running in time max_{k ≤ n} {n choose k} n^𝒪(1) = 𝒪^*(2ⁿ) (here, 𝒪^*(⋅) notation hides polynomial factors in n). In this paper we design first non-trivial exact algorithms for these problems. In particular, we obtain an 𝒪^*((1.89)ⁿ) time exact algorithm for k-MEDIAN that works for any value of k. Our algorithm is quite general in that it does not use any properties of the underlying (metric) space - it does not even require the distances to satisfy the triangle inequality. In particular, the same algorithm also works for k-Means. We complement this result by showing that the running time of our algorithm is asymptotically optimal, up to the base of the exponent. That is, unless the Exponential Time Hypothesis fails, there is no algorithm for these problems running in time 2^o(n)⋅n^𝒪(1). Finally, we consider the "facility location" or "supplier" versions of these clustering problems, where, in addition to the set X we are additionally given a set of m candidate centers (or facilities) F, and objective is to find a subset of k centers from F. The goal is still to minimize the k-Median/k-Means/k-Center objective. For these versions we give a 𝒪(2ⁿ (mn)^𝒪(1)) time algorithms using subset convolution. We complement this result by showing that, under the Set Cover Conjecture, the "supplier" versions of these problems do not admit an exact algorithm running in time 2^{(1-ε) n} (mn)^𝒪(1).

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
  • Theory of computation → Parameterized complexity and exact algorithms
Keywords
  • clustering
  • k-median
  • k-means
  • exact algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K Agarwal and Cecilia Magdalena Procopiuc. Exact and approximation algorithms for clustering. Algorithmica, 33(2):201-226, 2002. Google Scholar
  2. Andreas Björklund, Thore Husfeldt, and Mikko Koivisto. Set partitioning via inclusion-exclusion. SIAM Journal on Computing, 39(2):546-563, 2009. Google Scholar
  3. Jérémie Chalopin and Daniël Paulusma. Packing bipartite graphs with covers of complete bipartite graphs. Discret. Appl. Math., 168:40-50, 2014. URL: https://doi.org/10.1016/j.dam.2012.08.026.
  4. Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight fpt approximations for k-median and k-means. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.12334.
  5. Marek Cygan, Holger Dell, Daniel Lokshtanov, Dániel Marx, Jesper Nederlof, Yoshio Okamoto, Ramamohan Paturi, Saket Saurabh, and Magnus Wahlström. On problems as hard as cnf-sat. ACM Trans. Algorithms, 12, 2016. Google Scholar
  6. Marek Cygan, Fedor V Fomin, Łukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. Parameterized algorithms, volume 5(4). Springer, 2015. Google Scholar
  7. Fedor V Fomin, Serge Gaspers, Daniel Lokshtanov, and Saket Saurabh. Exact algorithms via monotone local search. Journal of the ACM (JACM), 66(2):1-23, 2019. Google Scholar
  8. Fedor V Fomin, Fabrizio Grandoni, and Dieter Kratsch. A measure & conquer approach for the analysis of exact algorithms. Journal of the ACM (JACM), 56(5):1-32, 2009. Google Scholar
  9. M. R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. Google Scholar
  10. Wen-Lian Hsu and George L Nemhauser. Easy and hard bottleneck location problems. Discrete Applied Mathematics, 1(3):209-215, 1979. Google Scholar
  11. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. J. Comput. Syst. Sci., 9:367-375, 2001. Google Scholar
  12. Yoichi Iwata. A faster algorithm for dominating set analyzed by the potential method. In International Symposium on Parameterized and Exact Computation, pages 41-54. Springer, 2011. Google Scholar
  13. Edmonds Jack. Paths, trees, and flowers. Canadian Journal of Mathematics, 17:449-467, 1965. Google Scholar
  14. Kamal Jain and Vijay V Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. Journal of the ACM (JACM), 48(2):274-296, 2001. Google Scholar
  15. David G. Kirkpatrick and Pavol Hell. On the complexity of general graph factor problems. SIAM J. Comput., 12(3):601-609, 1983. URL: https://doi.org/10.1137/0212040.
  16. Dieter Kratsch and FV Fomin. Exact exponential algorithms. Springer-Verlag Berlin Heidelberg, 2010. Google Scholar
  17. Stuart Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129-137, 1982. Google Scholar
  18. Daniel Lokshtanov, Dániel Marx, and Saket Saurabh. Lower bounds based on the exponential time hypothesis. Bull. EATCS, 105:41-72, 2011. Google Scholar
  19. James MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1(14), pages 281-297. Oakland, CA, USA, 1967. Google Scholar
  20. Hugo Steinhaus et al. Sur la division des corps matériels en parties. Bull. Acad. Polon. Sci, 1(804):801, 1956. Google Scholar
  21. Ola Svensson, Jakub Tarnawski, and László A Végh. A constant-factor approximation algorithm for the asymmetric traveling salesman problem. Journal of the ACM (JACM), 67(6):1-53, 2020. Google Scholar
  22. Vera Traub and Jens Vygen. An improved approximation algorithm for atsp. In Proceedings of the 52nd annual ACM SIGACT symposium on theory of computing, pages 1-13, 2020. Google Scholar
  23. Wikipedia. Geometric median - Wikipedia, the free encyclopedia. http://en.wikipedia.org/w/index.php?title=Geometric%20median&oldid=1061179886, 2022. [Online; accessed 21-April-2022].
  24. Wikipedia. Weber problem - Wikipedia, the free encyclopedia. http://en.wikipedia.org/w/index.php?title=Weber%20problem&oldid=916663348, 2022. [Online; accessed 21-April-2022].
  25. Gerhard J Woeginger. Exact algorithms for np-hard problems: A survey. In Combinatorial optimization - eureka, you shrink!, pages 185-207. Springer, 2003. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail