Fully-Dynamic Coresets

Henzinger, Monika; Kale, Sagar

doi:10.4230/LIPIcs.ESA.2020.57

File

Subject Classification

ACM Subject Classification

Theory of computation → Facility location and clustering

Keywords

Clustering
Coresets
Dynamic Algorithms

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

With input sizes becoming massive, coresets - small yet representative summary of the input - are relevant more than ever. A weighted set C_w that is a subset of the input is an ε-coreset if the cost of any feasible solution S with respect to C_w is within [1±ε] of the cost of S with respect to the original input. We give a very general technique to compute coresets in the fully-dynamic setting where input points can be added or deleted. Given a static (i.e., not dynamic) ε-coreset-construction algorithm that runs in time t(n, ε, λ) and computes a coreset of size s(n, ε, λ), where n is the number of input points and 1-λ is the success probability, we give a fully-dynamic algorithm that computes an ε-coreset with worst-case update time O((log n) ⋅ t(s(n, ε/log n, λ/n), ε/log n, λ/n)) (this bound is stated informally), where the success probability is 1-λ. Our technique is a fully-dynamic analog of the merge-and-reduce technique, which is due to Har-Peled and Mazumdar [Har-Peled and Mazumdar, 2004] and is based on a technique of Bentley and Saxe [Jon Louis Bentley and James B. Saxe, 1980], that applies to the insertion-only setting where points can only be added. Although, our space usage is O(n), our technique works in the presence of an adaptive adversary, and we show that Ω(n) space is required when adversary is adaptive. As a concrete implication of our technique, using the result of Braverman et al. [{Braverman} et al., 2016], we get fully-dynamic ε-coreset-construction algorithms for k-median and k-means with worst-case update time O(ε^{-2} k² log⁵ n log³ k) and coreset size O(ε^{-2} k log n log² k) ignoring log log n and log(1/ε) factors and assuming that ε = Ω(1/poly(n)) and λ = Ω(1/poly(n)) (which are very weak assumptions made only to make these bounds easy to parse). This results in the first fully-dynamic constant-approximation algorithms for k-median and k-means with update times O(poly(k, log n, ε^{-1})). Specifically, the dependence on k is only quadratic, and the bounds are worst-case. The best previous bound for both problems was amortized O(nlog n) by Cohen-Addad et al. [Cohen-Addad et al., 2019] via randomized O(1)-coresets in O(n) space. We also show that under the OMv conjecture [Monika Henzinger et al., 2015], a fully-dynamic (4 - δ)-approximation algorithm for k-means must either have an amortized update time of Ω(k^{1-γ}) or amortized query time of Ω(k^{2 - γ}), where γ > 0 is a constant.

Cite As Get BibTex

Monika Henzinger and Sagar Kale. Fully-Dynamic Coresets. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 57:1-57:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.ESA.2020.57

Author Details

Monika Henzinger

University of Vienna, Faculty of Computer Science, Austria

Sagar Kale

University of Vienna, Faculty of Computer Science, Austria

References

Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 61-72. IEEE Computer Society, 2017.
Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544-562, 2004.
Jon Louis Bentley and James B. Saxe. Decomposable searching problems I: Static-to-dynamic transformation. J. Algorithms, pages 301-358, 1980.
Vladimir Braverman, Dan Feldman, and Harry Lang. New Frameworks for Offline and Streaming Coreset Constructions. arXiv e-prints, 2016. URL: http://arxiv.org/abs/1612.00889.
Vladimir Braverman, Gereon Frahling, Harry Lang, Christian Sohler, and Lin F. Yang. Clustering high dimensional dynamic data streams. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 576-585, International Convention Centre, Sydney, Australia, August 06-11 2017. PMLR.
Jaroslaw Byrka, Thomas W. Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2):23:1-23:31, 2017.
Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for facility location problems. SIAM J. Comput., 34(4):803-824, 2005.
Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem. J. Comput. Syst. Sci., 65(1):129-149, 2002.
Ke Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009. URL: https://doi.org/10.1137/070699007.
Vincent Cohen-Addad, Niklas Oskar D Hjuler, Nikos Parotsidis, David Saulpic, and Chris Schwiegelshohn. Fully dynamic consistent facility location. In Advances in Neural Information Processing Systems 32, pages 3255-3265. Curran Associates, Inc., 2019. URL: http://papers.nips.cc/paper/8588-fully-dynamic-consistent-facility-location.pdf.
David Eppstein, Zvi Galil, Giuseppe F. Italiano, and Amnon Nissenzweig. Sparsification - a technique for speeding up dynamic graph algorithms. J. ACM, 44(5):669–696, 1997.
Dan Feldman and Michael Langberg. A unified framework for approximating and clustering data. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, San Jose, CA, USA, 6-8 June 2011, pages 569-578, 2011.
Dan Feldman, Melanie Schmidt, and Christian Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 1434-1453. SIAM, 2013.
Gereon Frahling and Christian Sohler. Coresets in dynamic geometric data streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing, Baltimore, MD, USA, May 22-24, 2005, pages 209-217. ACM, 2005.
Anupam Gupta and Kanat Tangwongsan. Simpler analyses of local search algorithms for facility location, 2008. URL: http://arxiv.org/abs/0809.2554.
Sariel Har-Peled and Akash Kushal. Smaller coresets for k-median and k-means clustering. Discret. Comput. Geom., 37(1):3-19, 2007.
Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’04, page 291–300. Association for Computing Machinery, 2004.
Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak. Unifying and strengthening hardness for dynamic problems via the online matrix-vector multiplication conjecture. In Symposium on Theory of Computing (STOC), pages 21-30, 2015. URL: https://doi.org/10.1145/2746539.2746609.
Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 731-740. ACM, 2002.
Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM, 48(2):274-296, 2001.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. Comput. Geom., 28(2-3):89-112, 2004.
Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. SIAM J. Comput., 45(2):530-547, 2016.
Ramgopal R. Mettu and C. Greg Plaxton. Optimal time bounds for approximate clustering. Mach. Learn., 56(1-3):35-60, 2004.
Christian Sohler and David P. Woodruff. Strong coresets for k-median and subspace approximation: Goodbye dimension. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 802-813. IEEE Computer Society, 2018.

Fully-Dynamic Coresets

Authors Monika Henzinger , Sagar Kale

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Fully-Dynamic Coresets

Authors Monika Henzinger , Sagar Kale

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message