On k-Means for Segments and Polylines

Authors Sergio Cabello , Panos Giannopoulos



PDF
Thumbnail PDF

File

LIPIcs.ESA.2023.28.pdf
  • Filesize: 0.74 MB
  • 14 pages

Document Identifiers

Author Details

Sergio Cabello
  • Faculty of Mathematics and Physics, University of Ljubljana, Slovenia
  • Institute of Mathematics, Physics and Mechanics, Ljubljana, Slovenia
Panos Giannopoulos
  • Department of Computer Science, City, University of London, UK

Cite As Get BibTex

Sergio Cabello and Panos Giannopoulos. On k-Means for Segments and Polylines. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 28:1-28:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.ESA.2023.28

Abstract

We study the problem of k-means clustering in the space of straight-line segments in ℝ² under the Hausdorff distance. For this problem, we give a (1+ε)-approximation algorithm that, for an input of n segments, for any fixed k, and with constant success probability, runs in time O(n + ε^{-O(k)} + ε^{-O(k)} ⋅ log^O(k) (ε^{-1})). The algorithm has two main ingredients. Firstly, we express the k-means objective in our metric space as a sum of algebraic functions and use the optimization technique of Vigneron [Antoine Vigneron, 2014] to approximate its minimum. Secondly, we reduce the input size by computing a small size coreset using the sensitivity-based sampling framework by Feldman and Langberg [Dan Feldman and Michael Langberg, 2011; Feldman et al., 2020]. Our results can be extended to polylines of constant complexity with a running time of O(n + ε^{-O(k)}).

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
Keywords
  • k-means clustering
  • segments
  • polylines
  • Hausdorff distance
  • Fréchet mean

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. SIAM J. Comput., 49(4), 2020. Google Scholar
  2. Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of Euclidean k-means. In 31st International Symposium on Computational Geometry, SoCG 2015, volume 34 of LIPIcs, pages 754-767. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2015. Google Scholar
  3. Sayan Bandyapadhyay and Kasturi R. Varadarajan. On variants of k-means clustering. In 32nd International Symposium on Computational Geometry, SoCG 2016, volume 51 of LIPIcs, pages 14:1-14:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. Google Scholar
  4. Saugata Basu, Richard Pollack, and Marie-Françoise Roy. Algorithms in Real Algebraic Geometry. Springer Berlin, Heidelberg, 2006. Google Scholar
  5. Kevin Buchin, Anne Driemel, Joachim Gudmundsson, Michael Horton, Irina Kostitsyna, Maarten Löffler, and Martijn Struijs. Approximating (k,𝓁)-center clustering for curves. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 2922-2938. SIAM, 2019. Google Scholar
  6. Maike Buchin, Anne Driemel, and Dennis Rohde. Approximating (k,𝓁)-median clustering for polygonal curves. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, pages 2697-2717, 2021. Google Scholar
  7. Frédéric Cazals, Bernard Delmas, and Timothee O'Donnell. Fréchet mean and p-mean on the unit circle: Decidability, algorithm, and applications to clustering on the flat torus. In 19th International Symposium on Experimental Algorithms, SEA 2021, volume 190 of LIPIcs, pages 15:1-15:16, 2021. Google Scholar
  8. Deeparnab Chakrabarty, Maryam Negahbani, and Ankita Sarkar. Approximation algorithms for continuous clustering and facility location problems. In 30th Annual European Symposium on Algorithms, ESA 2022, volume 244 of LIPIcs, pages 33:1-33:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. Google Scholar
  9. Ke Chen. On coresets for k-median and k-means clustering in metric and Euclidean spaces and their applications. SIAM J. Comput., 39(3):923-947, 2009. Google Scholar
  10. Siu-Wing Cheng and Haoqiang Huang. Curve simplification and clustering under Fréchet distance. In Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, pages 1414-1432. SIAM, 2023. Google Scholar
  11. Vincent Cohen-Addad. A fast approximation scheme for low-dimensional k-means. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, pages 430-440. SIAM, 2018. Google Scholar
  12. Vincent Cohen-Addad, Hossein Esfandiari, Vahab S. Mirrokni, and Shyam Narayanan. Improved approximations for Euclidean k-means and k-median, via nested quasi-independent sets. In Stefano Leonardi and Anupam Gupta, editors, STOC '22: 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1621-1628. ACM, 2022. Google Scholar
  13. Vincent Cohen-Addad, Andreas Emil Feldmann, and David Saulpic. Near-linear time approximation schemes for clustering in doubling metrics. J. ACM, 68(6):44:1-44:34, 2021. Google Scholar
  14. Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight FPT approximations for k-median and k-means. In 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, volume 132 of LIPIcs, pages 42:1-42:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. Google Scholar
  15. Vincent Cohen-Addad, Karthik C. S., and Euiwoong Lee. On approximability of clustering problems without candidate centers. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, pages 2635-2648, 2021. Google Scholar
  16. Vincent Cohen-Addad, Karthik C. S., and Euiwoong Lee. Johnson coverage hypothesis: Inapproximability of k-means and k-median in 𝓁_p-metrics. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, pages 1493-1530. SIAM, 2022. Google Scholar
  17. Vincent Cohen-Addad, Philip N. Klein, and Claire Mathieu. Local search yields approximation schemes for k-means and k-median in Euclidean and minor-free metrics. SIAM Journal on Computing, 48(2):644-667, 2019. Google Scholar
  18. Vincent Cohen-Addad, David Saulpic, and Chris Schwiegelshohn. Improved coresets and sublinear algorithms for power means in Euclidean spaces. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, pages 21085-21098, 2021. Google Scholar
  19. Anne Driemel, Amer Krivosija, and Christian Sohler. Clustering time series under the Fréchet distance. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, pages 766-785. SIAM, 2016. Google Scholar
  20. Anne Driemel, André Nusser, Jeff M. Phillips, and Ioannis Psarros. The VC dimension of metric balls under Fréchet and Hausdorff distances. Discret. Comput. Geom., 66(4):1351-1381, 2021. Google Scholar
  21. Dan Feldman and Michael Langberg. A unified framework for approximating and clustering data. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, pages 569-578. ACM, 2011. Google Scholar
  22. Dan Feldman, Melanie Schmidt, and Christian Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. SIAM Journal on Computing, 49(3):601-657, 2020. Google Scholar
  23. Daniel Ferguson and François G. Meyer. Computation of the sample Fréchet mean for sets of large graphs with applications to regression. In Proceedings of the 2022 SIAM International Conference on Data Mining, SDM 2022, pages 379-387, 2022. Google Scholar
  24. Maurice Fréchet. Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de L'Institut Henri Poincaré, 10(4):215-310, 1948. Google Scholar
  25. Fabrizio Grandoni, Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Rakesh Venkat. A refined approximation for Euclidean k-means. Inf. Process. Lett., 176:106251, 2022. Google Scholar
  26. Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. J. Algorithms, 31(1):228-248, 1999. Google Scholar
  27. Sariel Har-peled. Geometric Approximation Algorithms. American Mathematical Society, 2011. Google Scholar
  28. Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pages 291-300. ACM, 2004. Google Scholar
  29. Piotr Indyk. Sublinear time algorithms for metric space problems. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing, pages 428-434. ACM, 1999. Google Scholar
  30. Eric D. Kolaczyk, Lizhen Lin, Steven J. Rosenberg, Jie Xu, and Jackson Walters. Averages of unlabeled networks: Geometric characterization and asymptotic behavior. The Annals of Statistics, 48(1):514-538, 2020. Google Scholar
  31. Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. J. ACM, 57(2):5:1-5:32, 2010. Google Scholar
  32. Yair Marom and Dan Feldman. k-means clustering of lines for big data. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, pages 12797-12806, 2019. Google Scholar
  33. Jiří Matoušek. Lectures on Discrete Geometry, volume 212 of Graduate texts in mathematics. Springer, 2002. Google Scholar
  34. Nimrod Megiddo and Kenneth J. Supowit. On the complexity of some common geometric location problems. SIAM J. Comput., 13(1):182-196, 1984. Google Scholar
  35. François G. Meyer. The Fréchet mean of inhomogeneous random graphs. In Complex Networks & Their Applications X - Volume 1, Proceedings of the Tenth International Conference on Complex Networks and Their Applications, COMPLEX NETWORKS 2021, volume 1015 of Studies in Computational Intelligence, pages 207-219, 2021. Google Scholar
  36. Yuriy Mileyko, Sayan Mukherjee, and John Harer. Probability measures on the space of persistence diagrams. Inverse Problems, 27(12):124007, 2011. Google Scholar
  37. Abhinandan Nath and Erin Taylor. k-median clustering under discrete Fréchet and Hausdorff distances. J. Comput. Geom., 12:156-182, 2022. Google Scholar
  38. Christof Schötz. The Fréchet Mean and Statistics in Non-Euclidean Spaces. PhD thesis, Heidelberg University, The Faculty of Mathematics and Computer Science, 2021. Google Scholar
  39. Katharine Turner, Yuriy Mileyko, Sayan Mukherjee, and John Harer. Fréchet means for distributions of persistence diagrams. Discret. Comput. Geom., 52(1):44-70, 2014. Google Scholar
  40. Antoine Vigneron. Geometric optimization and sums of algebraic functions. ACM Trans. Algorithms, 10(1):4:1-4:20, 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail