Fully Dynamic k-Means Coreset in Near-Optimal Update Time

Authors Max Dupré la Tour , Monika Henzinger , David Saulpic



PDF
Thumbnail PDF

File

LIPIcs.ESA.2024.100.pdf
  • Filesize: 0.83 MB
  • 16 pages

Document Identifiers

Author Details

Max Dupré la Tour
  • McGill University, Montreal, Canada
Monika Henzinger
  • Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria
David Saulpic
  • CNRS & IRIF, Université Paris Cité, France

Cite AsGet BibTex

Max Dupré la Tour, Monika Henzinger, and David Saulpic. Fully Dynamic k-Means Coreset in Near-Optimal Update Time. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 100:1-100:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ESA.2024.100

Abstract

We study in this paper the problem of maintaining a solution to k-median and k-means clustering in a fully dynamic setting. To do so, we present an algorithm to efficiently maintain a coreset, a compressed version of the dataset, that allows easy computation of a clustering solution at query time. Our coreset algorithm has near-optimal update time of Õ(k) in general metric spaces, which reduces to Õ(d) in the Euclidean space ℝ^d. The query time is O(k²) in general metrics, and O(kd) in ℝ^d. To maintain a constant-factor approximation for k-median and k-means clustering in Euclidean space, this directly leads to an algorithm with update time Õ(d), and query time Õ(kd + k²). To maintain a O(polylog k)-approximation, the query time is reduced to Õ(kd).

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
Keywords
  • clustering
  • fully-dynamic
  • coreset
  • k-means

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K Agarwal, Sariel Har-Peled, and Kasturi R Varadarajan. Approximating extent measures of points. Journal of the ACM (JACM), 51(4):606-635, 2004. Google Scholar
  2. MohammadHossein Bateni, Hossein Esfandiari, Hendrik Fichtenberger, Monika Henzinger, Rajesh Jayaram, Vahab Mirrokni, and Andreas Wiese. Optimal fully dynamic k-center clustering for adaptive and oblivious adversaries. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 2677-2727. SIAM, 2023. URL: https://doi.org/10.1137/1.9781611977554.CH101.
  3. Jon Louis Bentley and James B. Saxe. Decomposable searching problems I: static-to-dynamic transformation. J. Algorithms, 1(4):301-358, 1980. URL: https://doi.org/10.1016/0196-6774(80)90015-2.
  4. Sayan Bhattacharya, Martín Costa, Silvio Lattanzi, and Nikos Parotsidis. Fully dynamic k-clustering in ̃ o(k) update time. To appear at NeurIPS"23, 2023. URL: https://doi.org/10.48550/arXiv.2310.17420.
  5. Vladimir Braverman, Gereon Frahling, Harry Lang, Christian Sohler, and Lin F. Yang. Clustering high dimensional dynamic data streams. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 576-585. PMLR, 2017. URL: http://proceedings.mlr.press/v70/braverman17a.html.
  6. Vladimir Braverman, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for clustering in excluded-minor graphs and beyond. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 2679-2696. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976465.159.
  7. T.-H. Hubert Chan, Arnaud Guerquin, and Mauro Sozio. Fully dynamic k-center clustering. In Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis, editors, Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, pages 579-587. ACM, 2018. URL: https://doi.org/10.1145/3178876.3186124.
  8. Vincent Cohen-Addad, Niklas Hjuler, Nikos Parotsidis, David Saulpic, and Chris Schwiegelshohn. Fully dynamic consistent facility location. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 3250-3260, 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/fface8385abbf94b4593a0ed53a0c70f-Abstract.html.
  9. Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, and Chris Schwiegelshohn. Towards optimal lower bounds for k-median and k-means coresets. In Stefano Leonardi and Anupam Gupta, editors, STOC '22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 1038-1051. ACM, 2022. URL: https://doi.org/10.1145/3519935.3519946.
  10. Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, and Omar Ali Sheikh-Omar. Improved coresets for euclidean k-means. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL: http://papers.nips.cc/paper_files/paper/2022/hash/120c9ab5c58ba0fa9dd3a22ace1de245-Abstract-Conference.html.
  11. Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, and Ola Svensson. Fast and accurate k-means++ via rejection sampling. In Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL: https://proceedings.neurips.cc/paper/2020/hash/babcff88f8be8c4795bd6f0f8cccca61-Abstract.html.
  12. Vincent Cohen-Addad, David Saulpic, and Chris Schwiegelshohn. A new coreset framework for clustering. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC '21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 169-182. ACM, 2021. URL: https://doi.org/10.1145/3406325.3451022.
  13. Vincent Cohen-Addad, David P. Woodruff, and Samson Zhou. Streaming euclidean k-median and k-means with o(log n) space. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, CA, USA, November 6-9, 2023, pages 883-908. IEEE, 2023. URL: https://doi.org/10.1109/FOCS57990.2023.00057.
  14. Dan Feldman, Melanie Schmidt, and Christian Sohler. Turning big data into tiny data: Constant-size coresets for k-means, pca, and projective clustering. SIAM J. Comput., 49(3):601-657, 2020. URL: https://doi.org/10.1137/18M1209854.
  15. Hendrik Fichtenberger, Silvio Lattanzi, Ashkan Norouzi-Fard, and Ola Svensson. Consistent k-clustering for general metrics. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 2660-2678. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976465.158.
  16. Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. J. Algorithms, 31(1):228-248, 1999. URL: https://doi.org/10.1006/jagm.1998.0993.
  17. Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In László Babai, editor, Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 291-300. ACM, 2004. URL: https://doi.org/10.1145/1007352.1007400.
  18. Monika Henzinger and Sagar Kale. Fully-dynamic coresets. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms, ESA 2020, September 7-9, 2020, Pisa, Italy (Virtual Conference), volume 173 of LIPIcs, pages 57:1-57:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPICS.ESA.2020.57.
  19. Wei Hu, Zhao Song, Lin F Yang, and Peilin Zhong. Nearly optimal dynamic k-means clustering for high-dimensional data. arXiv preprint arXiv:1802.00459, 2018. Google Scholar
  20. Lingxiao Huang, Jian Li, and Xuan Wu. On optimal coreset construction for euclidean (k,z)-clustering. CoRR, abs/2211.11923, 2022. URL: https://doi.org/10.48550/arXiv.2211.11923.
  21. Lingxiao Huang and Nisheeth K. Vishnoi. Coresets for clustering in euclidean spaces: importance sampling is nearly optimal. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 1416-1429. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384296.
  22. Konstantin Makarychev, Yury Makarychev, and Ilya P. Razenshteyn. Performance of johnson-lindenstrauss transform for k-means and k-medians clustering. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 1027-1038. ACM, 2019. URL: https://doi.org/10.1145/3313276.3316350.
  23. Ramgopal R. Mettu and C. Greg Plaxton. Optimal time bounds for approximate clustering. Mach. Learn., 56(1-3):35-60, 2004. URL: https://doi.org/10.1023/B:MACH.0000033114.18632.E0.
  24. Chris Schwiegelshohn and Omar Ali Sheikh-Omar. An empirical evaluation of k-means coresets. In Shiri Chechik, Gonzalo Navarro, Eva Rotenberg, and Grzegorz Herman, editors, 30th Annual European Symposium on Algorithms, ESA 2022, September 5-9, 2022, Berlin/Potsdam, Germany, volume 244 of LIPIcs, pages 84:1-84:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPICS.ESA.2022.84.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail