Parameterized k-Clustering: Tractability Island

Authors Fedor V. Fomin, Petr A. Golovach, Kirill Simonov



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2019.14.pdf
  • Filesize: 0.54 MB
  • 15 pages

Document Identifiers

Author Details

Fedor V. Fomin
  • Department of Informatics, University of Bergen, Norway
Petr A. Golovach
  • Department of Informatics, University of Bergen, Norway
Kirill Simonov
  • Department of Informatics, University of Bergen, Norway

Cite As Get BibTex

Fedor V. Fomin, Petr A. Golovach, and Kirill Simonov. Parameterized k-Clustering: Tractability Island. In 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 150, pp. 14:1-14:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.FSTTCS.2019.14

Abstract

In k-Clustering we are given a multiset of n vectors X subset Z^d and a nonnegative number D, and we need to decide whether X can be partitioned into k clusters C_1, ..., C_k such that the cost sum_{i=1}^k min_{c_i in R^d} sum_{x in C_i} |x-c_i|_p^p <= D, where |*|_p is the Minkowski (L_p) norm of order p. For p=1, k-Clustering is the well-known k-Median. For p=2, the case of the Euclidean distance, k-Clustering is k-Means. We study k-Clustering from the perspective of parameterized complexity. The problem is known to be NP-hard for k=2 and it is also NP-hard for d=2. It is a long-standing open question, whether the problem is fixed-parameter tractable (FPT) for the combined parameter d+k. In this paper, we focus on the parameterization by D. We complement the known negative results by showing that for p=0 and p=infty, k-Clustering is W1-hard when parameterized by D. Interestingly, the complexity landscape of the problem appears to be more intricate than expected. We discover a tractability island of k-Clustering: for every p in (0,1], k-Clustering is solvable in time 2^O(D log D) (nd)^O(1).

Subject Classification

ACM Subject Classification
  • Theory of computation → Parameterized complexity and exact algorithms
Keywords
  • clustering
  • parameterized complexity
  • k-means
  • k-median

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Marcel R. Ackermann, Johannes Blömer, and Christian Sohler. Clustering for metric and nonmetric distance measures. ACM Trans. Algorithms, 6(4):59:1-59:26, 2010. URL: https://doi.org/10.1145/1824777.1824779.
  2. Pankaj K. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. Approximating extent measures of points. J. ACM, 51(4):606-635, 2004. URL: https://doi.org/10.1145/1008731.1008736.
  3. Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245-248, May 2009. URL: https://doi.org/10.1007/s10994-009-5103-0.
  4. Noga Alon, Raphael Yuster, and Uri Zwick. Color-Coding. J. ACM, 42(4):844-856, 1995. URL: https://doi.org/10.1145/210332.210337.
  5. D. Angluin and L.G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. J. Computer and System Sciences, 18(2):155-193, 1979. URL: https://doi.org/10.1016/0022-0000(79)90045-X.
  6. Mihai Badoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), pages 250-257. ACM, 2002. URL: https://doi.org/10.1145/509907.509947.
  7. Christina Boucher, Christine Lo, and Daniel Lokshantov. Consensus Patterns (Probably) Has no EPTAS. In Nikhil Bansal and Irene Finocchi, editors, Algorithms - ESA 2015, pages 239-250, Berlin, Heidelberg, 2015. Springer Berlin Heidelberg. Google Scholar
  8. Christos Boutsidis, Anastasios Zouzias, Michael W. Mahoney, and Petros Drineas. Randomized Dimensionality Reduction for k-Means Clustering. IEEE Trans. Information Theory, 61(2):1045-1062, 2015. URL: https://doi.org/10.1109/TIT.2014.2375327.
  9. Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the 47tg annual ACM symposium on Theory of Computing (STOC), pages 163-172. ACM, 2015. Google Scholar
  10. Vincent Cohen-Addad. A Fast Approximation Scheme for Low-dimensional K-means. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 430-440. SIAM, 2018. URL: http://dl.acm.org/citation.cfm?id=3174304.3175298.
  11. Vincent Cohen-Addad, Arnaud de Mesmay, Eva Rotenberg, and Alan Roytman. The Bane of Low-dimensionality Clustering. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 441-456. SIAM, 2018. URL: http://dl.acm.org/citation.cfm?id=3174304.3175300.
  12. Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-21275-3.
  13. W. Fernandez de la Vega, Marek Karpinski, Claire Kenyon, and Yuval Rabani. Approximation Schemes for Clustering Problems. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC), pages 50-58. ACM, 2003. URL: https://doi.org/10.1145/780542.780550.
  14. Rodney G. Downey and Michael R. Fellows. Fundamentals of Parameterized Complexity. Texts in Computer Science. Springer, 2013. URL: https://doi.org/10.1007/978-1-4471-5559-1.
  15. Uriel Feige. NP-hardness of hypercube 2-segmentation. CoRR, abs/1411.0821, 2014. URL: http://arxiv.org/abs/1411.0821.
  16. Dan Feldman and Michael Langberg. A unified framework for approximating and clustering data. In Proceedings of the 43rd Annual ACM Symposium on Theory of Computing (STOC), pages 569-578. ACM, 2011. URL: https://doi.org/10.1145/1993636.1993712.
  17. Dan Feldman, Melanie Schmidt, and Christian Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1434-1453. SIAM, 2013. URL: https://doi.org/10.1137/1.9781611973105.
  18. Fedor V. Fomin, Petr A. Golovach, and Fahad Panolan. Parameterized Low-Rank Binary Matrix Approximation. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming (ICALP), volume 107 of LIPIcs, pages 53:1-53:16. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.53.
  19. Fedor V. Fomin, Petr A. Golovach, and Kirill Simonov. Parameterized k-Clustering: The distance matters!, 2019. URL: http://arxiv.org/abs/1902.08559.
  20. Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pages 291-300. ACM, 2004. Google Scholar
  21. Mary Inaba, Naoki Katoh, and Hiroshi Imai. Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In Proceedings of the 10th annual Symposium on Computational Geometry (SoCG), pages 332-339. ACM, 1994. Google Scholar
  22. Anil K Jain. Data clustering: 50 years beyond K-means. Pattern recognition letters, 31(8):651-666, 2010. Google Scholar
  23. Stavros G. Kolliopoulos and Satish Rao. A Nearly Linear-Time Approximation Scheme for the Euclidean k-Median Problem. SIAM J. Computing, 37(3):757-782, June 2007. URL: https://doi.org/10.1137/S0097539702404055.
  24. Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. J. ACM, 57(2):5:1-5:32, 2010. URL: https://doi.org/10.1145/1667053.1667054.
  25. Stuart P. Lloyd. Least squares quantization in PCM. IEEE Trans. Information Theory, 28(2):129-136, 1982. URL: https://doi.org/10.1109/TIT.1982.1056489.
  26. Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. The Planar k-Means Problem is NP-Hard. In Proceedings of the 3rd International Workshop on Algorithms and Computation (WALCOM), Lecture Notes in Comput. Sci., pages 274-285. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-00202-1_24.
  27. Dániel Marx. Closest Substring Problems with Small Distances. SIAM J. Comput., 38(4):1382-1410, 2008. URL: https://doi.org/10.1137/060673898.
  28. N. Megiddo and K. Supowit. On the Complexity of Some Common Geometric Location Problems. SIAM J. Computing, 13(1):182-196, 1984. URL: https://doi.org/10.1137/0213014.
  29. Moni Naor, Leonard J. Schulman, and Aravind Srinivasan. Splitters and Near-Optimal Derandomization. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS), pages 182-191. IEEE, 1995. Google Scholar
  30. Christian Sohler and David P. Woodruff. Strong Coresets for k-Median and Subspace Approximation: Goodbye Dimension. In Proceedings of the 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 802-813. IEEE, 2018. URL: https://doi.org/10.1109/FOCS.2018.00081.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail