Fully-Dynamic Coresets

Authors Monika Henzinger , Sagar Kale



PDF
Thumbnail PDF

File

LIPIcs.ESA.2020.57.pdf
  • Filesize: 0.54 MB
  • 21 pages

Document Identifiers

Author Details

Monika Henzinger
  • University of Vienna, Faculty of Computer Science, Austria
Sagar Kale
  • University of Vienna, Faculty of Computer Science, Austria

Cite AsGet BibTex

Monika Henzinger and Sagar Kale. Fully-Dynamic Coresets. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 57:1-57:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ESA.2020.57

Abstract

With input sizes becoming massive, coresets - small yet representative summary of the input - are relevant more than ever. A weighted set C_w that is a subset of the input is an ε-coreset if the cost of any feasible solution S with respect to C_w is within [1±ε] of the cost of S with respect to the original input. We give a very general technique to compute coresets in the fully-dynamic setting where input points can be added or deleted. Given a static (i.e., not dynamic) ε-coreset-construction algorithm that runs in time t(n, ε, λ) and computes a coreset of size s(n, ε, λ), where n is the number of input points and 1-λ is the success probability, we give a fully-dynamic algorithm that computes an ε-coreset with worst-case update time O((log n) ⋅ t(s(n, ε/log n, λ/n), ε/log n, λ/n)) (this bound is stated informally), where the success probability is 1-λ. Our technique is a fully-dynamic analog of the merge-and-reduce technique, which is due to Har-Peled and Mazumdar [Har-Peled and Mazumdar, 2004] and is based on a technique of Bentley and Saxe [Jon Louis Bentley and James B. Saxe, 1980], that applies to the insertion-only setting where points can only be added. Although, our space usage is O(n), our technique works in the presence of an adaptive adversary, and we show that Ω(n) space is required when adversary is adaptive. As a concrete implication of our technique, using the result of Braverman et al. [{Braverman} et al., 2016], we get fully-dynamic ε-coreset-construction algorithms for k-median and k-means with worst-case update time O(ε^{-2} k² log⁵ n log³ k) and coreset size O(ε^{-2} k log n log² k) ignoring log log n and log(1/ε) factors and assuming that ε = Ω(1/poly(n)) and λ = Ω(1/poly(n)) (which are very weak assumptions made only to make these bounds easy to parse). This results in the first fully-dynamic constant-approximation algorithms for k-median and k-means with update times O(poly(k, log n, ε^{-1})). Specifically, the dependence on k is only quadratic, and the bounds are worst-case. The best previous bound for both problems was amortized O(nlog n) by Cohen-Addad et al. [Cohen-Addad et al., 2019] via randomized O(1)-coresets in O(n) space. We also show that under the OMv conjecture [Monika Henzinger et al., 2015], a fully-dynamic (4 - δ)-approximation algorithm for k-means must either have an amortized update time of Ω(k^{1-γ}) or amortized query time of Ω(k^{2 - γ}), where γ > 0 is a constant.

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
Keywords
  • Clustering
  • Coresets
  • Dynamic Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 61-72. IEEE Computer Society, 2017. Google Scholar
  2. Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544-562, 2004. Google Scholar
  3. Jon Louis Bentley and James B. Saxe. Decomposable searching problems I: Static-to-dynamic transformation. J. Algorithms, pages 301-358, 1980. Google Scholar
  4. Vladimir Braverman, Dan Feldman, and Harry Lang. New Frameworks for Offline and Streaming Coreset Constructions. arXiv e-prints, 2016. URL: http://arxiv.org/abs/1612.00889.
  5. Vladimir Braverman, Gereon Frahling, Harry Lang, Christian Sohler, and Lin F. Yang. Clustering high dimensional dynamic data streams. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 576-585, International Convention Centre, Sydney, Australia, August 06-11 2017. PMLR. Google Scholar
  6. Jaroslaw Byrka, Thomas W. Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2):23:1-23:31, 2017. Google Scholar
  7. Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for facility location problems. SIAM J. Comput., 34(4):803-824, 2005. Google Scholar
  8. Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem. J. Comput. Syst. Sci., 65(1):129-149, 2002. Google Scholar
  9. Ke Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009. URL: https://doi.org/10.1137/070699007.
  10. Vincent Cohen-Addad, Niklas Oskar D Hjuler, Nikos Parotsidis, David Saulpic, and Chris Schwiegelshohn. Fully dynamic consistent facility location. In Advances in Neural Information Processing Systems 32, pages 3255-3265. Curran Associates, Inc., 2019. URL: http://papers.nips.cc/paper/8588-fully-dynamic-consistent-facility-location.pdf.
  11. David Eppstein, Zvi Galil, Giuseppe F. Italiano, and Amnon Nissenzweig. Sparsification - a technique for speeding up dynamic graph algorithms. J. ACM, 44(5):669–696, 1997. Google Scholar
  12. Dan Feldman and Michael Langberg. A unified framework for approximating and clustering data. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, San Jose, CA, USA, 6-8 June 2011, pages 569-578, 2011. Google Scholar
  13. Dan Feldman, Melanie Schmidt, and Christian Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 1434-1453. SIAM, 2013. Google Scholar
  14. Gereon Frahling and Christian Sohler. Coresets in dynamic geometric data streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing, Baltimore, MD, USA, May 22-24, 2005, pages 209-217. ACM, 2005. Google Scholar
  15. Anupam Gupta and Kanat Tangwongsan. Simpler analyses of local search algorithms for facility location, 2008. URL: http://arxiv.org/abs/0809.2554.
  16. Sariel Har-Peled and Akash Kushal. Smaller coresets for k-median and k-means clustering. Discret. Comput. Geom., 37(1):3-19, 2007. Google Scholar
  17. Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’04, page 291–300. Association for Computing Machinery, 2004. Google Scholar
  18. Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak. Unifying and strengthening hardness for dynamic problems via the online matrix-vector multiplication conjecture. In Symposium on Theory of Computing (STOC), pages 21-30, 2015. URL: https://doi.org/10.1145/2746539.2746609.
  19. Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 731-740. ACM, 2002. Google Scholar
  20. Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM, 48(2):274-296, 2001. Google Scholar
  21. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. Comput. Geom., 28(2-3):89-112, 2004. Google Scholar
  22. Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. SIAM J. Comput., 45(2):530-547, 2016. Google Scholar
  23. Ramgopal R. Mettu and C. Greg Plaxton. Optimal time bounds for approximate clustering. Mach. Learn., 56(1-3):35-60, 2004. Google Scholar
  24. Christian Sohler and David P. Woodruff. Strong coresets for k-median and subspace approximation: Goodbye dimension. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 802-813. IEEE Computer Society, 2018. Google Scholar