Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams

Authors Amit Chakrabarti , Andrew McGregor , Anthony Wirth



PDF
Thumbnail PDF

File

LIPIcs.ESA.2024.40.pdf
  • Filesize: 0.81 MB
  • 15 pages

Document Identifiers

Author Details

Amit Chakrabarti
  • Department of Computer Science, Dartmouth College, Hanover, NH, USA
Andrew McGregor
  • Manning College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA
Anthony Wirth
  • School of Computing and Information Systems, The University of Melbourne, Australia
  • School of Computer Science, The University of Sydney, Australia

Acknowledgements

We thank the reviewers for helpful comments.

Cite AsGet BibTex

Amit Chakrabarti, Andrew McGregor, and Anthony Wirth. Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 40:1-40:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ESA.2024.40

Abstract

The maximum coverage problem is to select k sets, from a collection of m sets, such that the cardinality of their union, in a universe of size n, is maximized. We consider (1-1/e-ε)-approximation algorithms for this NP-hard problem in three standard data stream models. 1) Dynamic Model. The stream consists of a sequence of sets being inserted and deleted. Our multi-pass algorithm uses ε^{-2} k ⋅ polylog(n,m) space. The best previous result (Assadi and Khanna, SODA 2018) used (n +ε^{-4} k) polylog(n,m) space. While both algorithms use O(ε^{-1} log m) passes, our analysis shows that, when ε ≤ 1/log log m, it is possible to reduce the number of passes by a 1/log log m factor without incurring additional space. 2) Random Order Model. In this model, there are no deletions, and the sets forming the instance are uniformly randomly permuted to form the input stream. We show that a single pass and k polylog(n,m) space suffices for arbitrary small constant ε. The best previous result, by Warneke et al. (ESA 2023), used k² polylog(n,m) space. 3) Insert-Only Model. Lastly, our results, along with numerous previous results, use a sub-sampling technique introduced by McGregor and Vu (ICDT 2017) to sparsify the input instance. We explain how this technique and others used in the paper can be implemented such that the amortized update time of our algorithm is polylogarithmic. This also implies an improvement of the state-of-the-art insert only algorithms in terms of the update time: polylog(m,n) update time suffices, whereas the best previous result by Jaud et al. (SEA 2023) required update time that was linear in k.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sketching and sampling
Keywords
  • Data Stream Computation
  • Maximum Coverage
  • Submodular Maximization

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Shipra Agrawal, Mohammad Shadravan, and Cliff Stein. Submodular secretary problem with shortlists. In Avrim Blum, editor, 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, volume 124 of LIPIcs, pages 1:1-1:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPICS.ITCS.2019.1.
  2. Aris Anagnostopoulos, Luca Becchetti, Ilaria Bordino, Stefano Leonardi, Ida Mele, and Piotr Sankowski. Stochastic query covering for fast approximate document retrieval. ACM Transactions on Information Systems (TOIS), 33(3):1-35, 2015. Google Scholar
  3. Sepehr Assadi and Sanjeev Khanna. Tight bounds on the round complexity of the distributed maximum coverage problem. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 2412-2431. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975031.155.
  4. Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streaming complexity of the set cover problem. In STOC, pages 698-711. ACM, 2016. Google Scholar
  5. Giorgio Ausiello, Nicolas Boria, Aristotelis Giannakos, Giorgio Lucarelli, and Vangelis Th. Paschos. Online maximum k-coverage. Discrete Applied Mathematics, 160(13-14):1901-1913, 2012. Google Scholar
  6. Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 671-680, 2014. Google Scholar
  7. MohammadHossein Bateni, Hossein Esfandiari, and Vahab S. Mirrokni. Almost optimal streaming algorithms for coverage problems. In Christian Scheideler and Mohammad Taghi Hajiaghayi, editors, Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2017, Washington DC, USA, July 24-26, 2017, pages 13-23. ACM, 2017. URL: https://doi.org/10.1145/3087556.3087585.
  8. Amit Chakrabarti and Anthony Wirth. Incidence geometries and the pass complexity of semi-streaming set cover. In SODA, pages 1365-1373. SIAM, 2016. Google Scholar
  9. Graham Cormode, Howard J. Karloff, and Anthony Wirth. Set cover algorithms for very large datasets. In CIKM, pages 479-488. ACM, 2010. Google Scholar
  10. Yuval Emek and Adi Rosén. Semi-streaming set cover - (extended abstract). In ICALP (1), volume 8572 of Lecture Notes in Computer Science, pages 453-464. Springer, 2014. Google Scholar
  11. Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634-652, 1998. Google Scholar
  12. Moran Feldman, Ashkan Norouzi-Fard, Ola Svensson, and Rico Zenklusen. The one-way communication complexity of submodular maximization with applications to streaming and robustness. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 1363-1374, 2020. Google Scholar
  13. Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, and Ali Vakilian. Towards tight bounds for the streaming set cover problem. In PODS, pages 371-383. ACM, 2016. Google Scholar
  14. Dorit S Hochbaum and Anu Pathria. Analysis of the greedy approach in problems of maximum k-coverage. Naval Research Logistics (NRL), 45(6):615-627, 1998. Google Scholar
  15. Piotr Indyk and Ali Vakilian. Tight trade-offs for the maximum k-coverage problem in the general streaming model. In 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 200-217, 2019. Google Scholar
  16. Stephen Jaud, Anthony Wirth, and Farhana Choudhury. Maximum Coverage in Sublinear Space, Faster. In 21st International Symposium on Experimental Algorithms (SEA 2023), volume 265, pages 21:1-21:20, Dagstuhl, Germany, 2023. URL: https://doi.org/10.4230/LIPIcs.SEA.2023.21.
  17. Hossein Jowhari, Mert Sağlam, and Gábor Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In Maurizio Lenzerini and Thomas Schwentick, editors, Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 2011, Athens, Greece, pages 49-58. ACM, 2011. URL: https://doi.org/10.1145/1989284.1989289.
  18. Daniel M Kane, Jelani Nelson, Ely Porat, and David P Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 745-754, 2011. Google Scholar
  19. Richard M Karp. Reducibility among combinatorial problems. In Complexity of Computer Computations, pages 85-103. Springer, 1972. Google Scholar
  20. David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11:105-147, 2015. Google Scholar
  21. Sanjeev Khanna, Christian Konrad, and Cezar-Mihail Alexandru. Set cover in the one-pass edge-arrival streaming model. In Floris Geerts, Hung Q. Ngo, and Stavros Sintos, editors, Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2023, Seattle, WA, USA, June 18-23, 2023, pages 127-139. ACM, 2023. URL: https://doi.org/10.1145/3584372.3588678.
  22. Andreas Krause and Daniel Golovin. Submodular function maximization. In Lucas Bordeaux, Youssef Hamadi, and Pushmeet Kohli, editors, Tractability: Practical Approaches to Hard Problems, pages 71-104. Cambridge University Press, 2014. URL: https://doi.org/10.1017/CBO9781139177801.004.
  23. Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular functions. In AAAI, pages 1650-1654. AAAI Press, 2007. Google Scholar
  24. Ching Lih Lim, Alistair Moffat, and Anthony Wirth. Lazy and eager approaches for the set cover problem. In Proceedings of the 37th Australasian Computer Science Conference, pages 19-27, 2014. Google Scholar
  25. Paul Liu, Aviad Rubinstein, Jan Vondrák, and Junyao Zhao. Cardinality constrained submodular maximization for random streams. In Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 6491-6502, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/333222170ab9edca4785c39f55221fe7-Abstract.html.
  26. Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T. Vu. Densest subgraph in dynamic graph streams. In MFCS (2), volume 9235 of Lecture Notes in Computer Science, pages 472-482. Springer, 2015. Google Scholar
  27. Andrew McGregor and Hoa T. Vu. Better streaming algorithms for the maximum coverage problem. Theory Comput. Syst., 63(7):1595-1619, 2019. URL: https://doi.org/10.1007/S00224-018-9878-X.
  28. Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrovic, Amir Zandieh, Aidasadat Mousavifar, and Ola Svensson. Beyond 1/2-approximation for submodular maximization on massive data streams. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 3826-3835. PMLR, 2018. URL: http://proceedings.mlr.press/v80/norouzi-fard18a.html.
  29. Barna Saha and Lise Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697-708. SIAM, 2009. Google Scholar
  30. Joachim von zur Gathen and Jürgen Gerhard. Modern Computer Algebra. Cambridge University Press, 1999. Google Scholar
  31. Rowan Warneke, Farhana Murtaza Choudhury, and Anthony Wirth. Maximum coverage in random-arrival streams. In Inge Li Gørtz, Martin Farach-Colton, Simon J. Puglisi, and Grzegorz Herman, editors, 31st Annual European Symposium on Algorithms, ESA 2023, September 4-6, 2023, Amsterdam, The Netherlands, volume 274 of LIPIcs, pages 102:1-102:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPICS.ESA.2023.102.
  32. Mark N. Wegman and Larry Carter. New classes and applications of hash functions. In Proc. 20th Annual IEEE Symposium on Foundations of Computer Science, pages 175-182, 1979. Google Scholar
  33. Huiwen Yu and Dayu Yuan. Set coverage problems in a one-pass data stream. In SDM, pages 758-766. SIAM, 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail