Better Streaming Algorithms for the Maximum Coverage Problem

Authors Andrew McGregor, Hoa T. Vu

Thumbnail PDF


  • Filesize: 0.55 MB
  • 18 pages

Document Identifiers

Author Details

Andrew McGregor
Hoa T. Vu

Cite AsGet BibTex

Andrew McGregor and Hoa T. Vu. Better Streaming Algorithms for the Maximum Coverage Problem. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe {1,...,n}, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor 1-1/e in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to 1-1/e, that use sublinear space o(mn). Our main results are: 1) Two (1-1/e-epsilon) approximation algorithms: One uses O(1/epsilon) passes and O(k/epsilon^2 polylog(m,n)) space whereas the other uses only a single pass but O(m/epsilon^2 polylog(m,n)) space. 2) We show that any approximation factor better than (1-(1-1/k)^k) in constant passes require space that is linear in m for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, (1-epsilon) approximation algorithm using O(m/epsilon^2 min(k,1/epsilon) polylog(m,n)) space. We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires space that is linear in N for constant k whereas O(N/epsilon^2 polylog(m,n)) space is sufficient for a (1-epsilon) approximation and arbitrary k in a single pass. For regular graphs, we show that O(k/epsilon^3 polylog(m,n)) space is sufficient for a (1-epsilon) approximation in a single pass. We generalize this to a K-epsilon approximation when the ratio between the minimum and maximum degree is bounded below by K.
  • algorithms
  • data streams
  • approximation
  • maximum coverage


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Alexander A. Ageev and Maxim Sviridenko. Approximation algorithms for maximum coverage and max cut with given sizes of parts. In IPCO, volume 1610 of Lecture Notes in Computer Science, pages 17-30. Springer, 1999. Google Scholar
  2. Kook Jin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, and Anthony Wirth. Correlation clustering in data streams. In ICML, volume 37 of JMLR Workshop and Conference Proceedings, pages 2237-2246., 2015. Google Scholar
  3. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Analyzing graph structure via linear measurements. In SODA, pages 459-467. SIAM, 2012. Google Scholar
  4. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Graph sketches: sparsification, spanners, and subgraphs. In PODS, pages 5-14. ACM, 2012. Google Scholar
  5. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Spectral sparsification in dynamic graph streams. In APPROX-RANDOM, volume 8096 of Lecture Notes in Computer Science, pages 1-10. Springer, 2013. Google Scholar
  6. Aris Anagnostopoulos, Luca Becchetti, Ilaria Bordino, Stefano Leonardi, Ida Mele, and Piotr Sankowski. Stochastic query covering for fast approximate document retrieval. ACM Trans. Inf. Syst., 33(3):11:1-11:35, 2015. Google Scholar
  7. Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streaming complexity of the set cover problem. In STOC, pages 698-711. ACM, 2016. Google Scholar
  8. Sepehr Assadi, Sanjeev Khanna, Yang Li, and Grigory Yaroslavtsev. Maximum matchings in dynamic graph streams and the simultaneous communication model. In SODA, pages 1345-1364. SIAM, 2016. Google Scholar
  9. Giorgio Ausiello, Nicolas Boria, Aristotelis Giannakos, Giorgio Lucarelli, and Vangelis Th. Paschos. Online maximum k-coverage. Discrete Applied Mathematics, 160(13-14):1901-1913, 2012. Google Scholar
  10. Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming submodular maximization: massive data summarization on the fly. In KDD, pages 671-680. ACM, 2014. Google Scholar
  11. Ashwinkumar Badanidiyuru and Jan Vondrák. Fast algorithms for maximizing submodular functions. In SODA, pages 1497-1514. SIAM, 2014. Google Scholar
  12. MohammadHossein Bateni, Hossein Esfandiari, and Vahab S. Mirrokni. Almost optimal streaming algorithms for coverage problems. CoRR, abs/1610.08096, 2016. Google Scholar
  13. Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos E. Tsourakakis. Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In STOC, pages 173-182. ACM, 2015. Google Scholar
  14. Édouard Bonnet, Bruno Escoffier, Vangelis Th. Paschos, and Georgios Stamoulis. A 0.821-ratio purely combinatorial algorithm for maximum k-vertex cover in bipartite graphs. In LATIN, volume 9644 of Lecture Notes in Computer Science, pages 235-248. Springer, 2016. Google Scholar
  15. Bugra Caskurlu, Vahan Mkrtchyan, Ojas Parekh, and K. Subramani. On partial vertex cover and budgeted maximum coverage problems in bipartite graphs. In IFIP TCS, volume 8705 of Lecture Notes in Computer Science, pages 13-26. Springer, 2014. Google Scholar
  16. Amit Chakrabarti, Graham Cormode, and Andrew McGregor. Robust lower bounds for communication and stream computation. Electronic Colloquium on Computational Complexity (ECCC), 18:62, 2011. Google Scholar
  17. Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In IEEE Conference on Computational Complexity, pages 107-117. IEEE Computer Society, 2003. Google Scholar
  18. Amit Chakrabarti and Anthony Wirth. Incidence geometries and the pass complexity of semi-streaming set cover. In SODA, pages 1365-1373. SIAM, 2016. Google Scholar
  19. Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. Streaming algorithms for submodular function maximization. In ICALP (1), volume 9134 of Lecture Notes in Computer Science, pages 318-330. Springer, 2015. Google Scholar
  20. Rajesh Chitnis, Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi, Andrew McGregor, Morteza Monemizadeh, and Sofya Vorotnikova. Kernelization via sampling with applications to finding matchings and related problems in dynamic graph streams. In SODA, pages 1326-1344. SIAM, 2016. Google Scholar
  21. Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). IEEE Trans. Knowl. Data Eng., 15(3):529-540, 2003. Google Scholar
  22. Graham Cormode, Howard J. Karloff, and Anthony Wirth. Set cover algorithms for very large datasets. In CIKM, pages 479-488. ACM, 2010. Google Scholar
  23. Yuval Emek and Adi Rosén. Semi-streaming set cover - (extended abstract). In ICALP (1), volume 8572 of Lecture Notes in Computer Science, pages 453-464. Springer, 2014. Google Scholar
  24. Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634-652, 1998. Google Scholar
  25. Sudipto Guha, Andrew McGregor, and David Tench. Vertex and hyperedge connectivity in dynamic graph streams. In PODS, pages 241-247. ACM, 2015. Google Scholar
  26. Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, and Ali Vakilian. Towards tight bounds for the streaming set cover problem. In PODS, pages 371-383. ACM, 2016. Google Scholar
  27. Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and Aaron Sidford. Single pass spectral sparsification in dynamic streams. In FOCS, pages 561-570. IEEE Computer Society, 2014. Google Scholar
  28. Michael Kapralov and David P. Woodruff. Spanners and sparsifiers in dynamic streams. In PODC, pages 272-281. ACM, 2014. Google Scholar
  29. David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11:105-147, 2015. Google Scholar
  30. Samir Khuller, Anna Moss, and Joseph Naor. The budgeted maximum coverage problem. Inf. Process. Lett., 70(1):39-45, 1999. Google Scholar
  31. Dmitry Kogan and Robert Krauthgamer. Sketching cuts in graphs and hypergraphs. In 6th Innovations in Theoretical Computer Science, 2015. Google Scholar
  32. Christian Konrad. Maximum matching in turnstile streams. In ESA, volume 9294 of Lecture Notes in Computer Science, pages 840-852. Springer, 2015. Google Scholar
  33. Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular functions. In AAAI, pages 1650-1654. AAAI Press, 2007. Google Scholar
  34. Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. Fast greedy algorithms in mapreduce and streaming. TOPC, 2(3):14, 2015. Google Scholar
  35. Andrew McGregor. Graph stream algorithms: a survey. SIGMOD Record, 43(1):9-20, 2014. Google Scholar
  36. Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T. Vu. Densest subgraph in dynamic graph streams. In MFCS (2), volume 9235 of Lecture Notes in Computer Science, pages 472-482. Springer, 2015. Google Scholar
  37. Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. Better algorithms for counting triangles in data streams. In PODS, pages 401-411. ACM, 2016. Google Scholar
  38. Andrew McGregor and Hoa T. Vu. Better streaming algorithms for the maximum coverage problem. CoRR, abs/1610.06199, 2016. URL:
  39. Alessandro Panconesi and Aravind Srinivasan. Randomized distributed edge coloring via an extension of the chernoff-hoeffding bounds. SIAM J. Comput., 26(2):350-368, 1997. Google Scholar
  40. Jaikumar Radhakrishnan and Saswata Shannigrahi. Streaming algorithms for 2-coloring uniform hypergraphs. In Algorithms and Data Structures - 12th International Symposium, WADS 2011, New York, NY, USA, August 15-17, 2011. Proceedings, pages 667-678, 2011. URL:
  41. Barna Saha and Lise Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697-708. SIAM, 2009. Google Scholar
  42. Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discrete Math., 8(2):223-250, 1995. Google Scholar
  43. Daniel A. Spielman and Shang-Hua Teng. Spectral sparsification of graphs. SIAM J. Comput., 40(4):981-1025, 2011. Google Scholar
  44. He Sun. Counting hypergraphs in data streams. CoRR, abs/1304.7456, 2013. URL:
  45. Huiwen Yu and Dayu Yuan. Set coverage problems in a one-pass data stream. In SDM, pages 758-766. SIAM, 2013. Google Scholar