Document

# Better Streaming Algorithms for the Maximum Coverage Problem

## File

LIPIcs.ICDT.2017.22.pdf
• Filesize: 0.55 MB
• 18 pages

## Cite As

Andrew McGregor and Hoa T. Vu. Better Streaming Algorithms for the Maximum Coverage Problem. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ICDT.2017.22

## Abstract

We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe {1,...,n}, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor 1-1/e in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to 1-1/e, that use sublinear space o(mn). Our main results are: 1) Two (1-1/e-epsilon) approximation algorithms: One uses O(1/epsilon) passes and O(k/epsilon^2 polylog(m,n)) space whereas the other uses only a single pass but O(m/epsilon^2 polylog(m,n)) space. 2) We show that any approximation factor better than (1-(1-1/k)^k) in constant passes require space that is linear in m for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, (1-epsilon) approximation algorithm using O(m/epsilon^2 min(k,1/epsilon) polylog(m,n)) space. We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires space that is linear in N for constant k whereas O(N/epsilon^2 polylog(m,n)) space is sufficient for a (1-epsilon) approximation and arbitrary k in a single pass. For regular graphs, we show that O(k/epsilon^3 polylog(m,n)) space is sufficient for a (1-epsilon) approximation in a single pass. We generalize this to a K-epsilon approximation when the ratio between the minimum and maximum degree is bounded below by K.
##### Keywords
• algorithms
• data streams
• approximation
• maximum coverage

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Alexander A. Ageev and Maxim Sviridenko. Approximation algorithms for maximum coverage and max cut with given sizes of parts. In IPCO, volume 1610 of Lecture Notes in Computer Science, pages 17-30. Springer, 1999.
2. Kook Jin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, and Anthony Wirth. Correlation clustering in data streams. In ICML, volume 37 of JMLR Workshop and Conference Proceedings, pages 2237-2246. JMLR.org, 2015.
3. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Analyzing graph structure via linear measurements. In SODA, pages 459-467. SIAM, 2012.
4. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Graph sketches: sparsification, spanners, and subgraphs. In PODS, pages 5-14. ACM, 2012.
5. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Spectral sparsification in dynamic graph streams. In APPROX-RANDOM, volume 8096 of Lecture Notes in Computer Science, pages 1-10. Springer, 2013.
6. Aris Anagnostopoulos, Luca Becchetti, Ilaria Bordino, Stefano Leonardi, Ida Mele, and Piotr Sankowski. Stochastic query covering for fast approximate document retrieval. ACM Trans. Inf. Syst., 33(3):11:1-11:35, 2015.
7. Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streaming complexity of the set cover problem. In STOC, pages 698-711. ACM, 2016.
8. Sepehr Assadi, Sanjeev Khanna, Yang Li, and Grigory Yaroslavtsev. Maximum matchings in dynamic graph streams and the simultaneous communication model. In SODA, pages 1345-1364. SIAM, 2016.
9. Giorgio Ausiello, Nicolas Boria, Aristotelis Giannakos, Giorgio Lucarelli, and Vangelis Th. Paschos. Online maximum k-coverage. Discrete Applied Mathematics, 160(13-14):1901-1913, 2012.
10. Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming submodular maximization: massive data summarization on the fly. In KDD, pages 671-680. ACM, 2014.
11. Ashwinkumar Badanidiyuru and Jan Vondrák. Fast algorithms for maximizing submodular functions. In SODA, pages 1497-1514. SIAM, 2014.
12. MohammadHossein Bateni, Hossein Esfandiari, and Vahab S. Mirrokni. Almost optimal streaming algorithms for coverage problems. CoRR, abs/1610.08096, 2016.
13. Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos E. Tsourakakis. Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In STOC, pages 173-182. ACM, 2015.
14. Édouard Bonnet, Bruno Escoffier, Vangelis Th. Paschos, and Georgios Stamoulis. A 0.821-ratio purely combinatorial algorithm for maximum k-vertex cover in bipartite graphs. In LATIN, volume 9644 of Lecture Notes in Computer Science, pages 235-248. Springer, 2016.
15. Bugra Caskurlu, Vahan Mkrtchyan, Ojas Parekh, and K. Subramani. On partial vertex cover and budgeted maximum coverage problems in bipartite graphs. In IFIP TCS, volume 8705 of Lecture Notes in Computer Science, pages 13-26. Springer, 2014.
16. Amit Chakrabarti, Graham Cormode, and Andrew McGregor. Robust lower bounds for communication and stream computation. Electronic Colloquium on Computational Complexity (ECCC), 18:62, 2011.
17. Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In IEEE Conference on Computational Complexity, pages 107-117. IEEE Computer Society, 2003.
18. Amit Chakrabarti and Anthony Wirth. Incidence geometries and the pass complexity of semi-streaming set cover. In SODA, pages 1365-1373. SIAM, 2016.
19. Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. Streaming algorithms for submodular function maximization. In ICALP (1), volume 9134 of Lecture Notes in Computer Science, pages 318-330. Springer, 2015.
20. Rajesh Chitnis, Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi, Andrew McGregor, Morteza Monemizadeh, and Sofya Vorotnikova. Kernelization via sampling with applications to finding matchings and related problems in dynamic graph streams. In SODA, pages 1326-1344. SIAM, 2016.
21. Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). IEEE Trans. Knowl. Data Eng., 15(3):529-540, 2003.
22. Graham Cormode, Howard J. Karloff, and Anthony Wirth. Set cover algorithms for very large datasets. In CIKM, pages 479-488. ACM, 2010.
23. Yuval Emek and Adi Rosén. Semi-streaming set cover - (extended abstract). In ICALP (1), volume 8572 of Lecture Notes in Computer Science, pages 453-464. Springer, 2014.
24. Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634-652, 1998.
25. Sudipto Guha, Andrew McGregor, and David Tench. Vertex and hyperedge connectivity in dynamic graph streams. In PODS, pages 241-247. ACM, 2015.
26. Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, and Ali Vakilian. Towards tight bounds for the streaming set cover problem. In PODS, pages 371-383. ACM, 2016.
27. Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and Aaron Sidford. Single pass spectral sparsification in dynamic streams. In FOCS, pages 561-570. IEEE Computer Society, 2014.
28. Michael Kapralov and David P. Woodruff. Spanners and sparsifiers in dynamic streams. In PODC, pages 272-281. ACM, 2014.
29. David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11:105-147, 2015.
30. Samir Khuller, Anna Moss, and Joseph Naor. The budgeted maximum coverage problem. Inf. Process. Lett., 70(1):39-45, 1999.
31. Dmitry Kogan and Robert Krauthgamer. Sketching cuts in graphs and hypergraphs. In 6th Innovations in Theoretical Computer Science, 2015.
32. Christian Konrad. Maximum matching in turnstile streams. In ESA, volume 9294 of Lecture Notes in Computer Science, pages 840-852. Springer, 2015.
33. Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular functions. In AAAI, pages 1650-1654. AAAI Press, 2007.
34. Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. Fast greedy algorithms in mapreduce and streaming. TOPC, 2(3):14, 2015.
35. Andrew McGregor. Graph stream algorithms: a survey. SIGMOD Record, 43(1):9-20, 2014.
36. Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T. Vu. Densest subgraph in dynamic graph streams. In MFCS (2), volume 9235 of Lecture Notes in Computer Science, pages 472-482. Springer, 2015.
37. Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. Better algorithms for counting triangles in data streams. In PODS, pages 401-411. ACM, 2016.
38. Andrew McGregor and Hoa T. Vu. Better streaming algorithms for the maximum coverage problem. CoRR, abs/1610.06199, 2016. URL: http://arxiv.org/abs/1610.06199.
39. Alessandro Panconesi and Aravind Srinivasan. Randomized distributed edge coloring via an extension of the chernoff-hoeffding bounds. SIAM J. Comput., 26(2):350-368, 1997.
40. Jaikumar Radhakrishnan and Saswata Shannigrahi. Streaming algorithms for 2-coloring uniform hypergraphs. In Algorithms and Data Structures - 12th International Symposium, WADS 2011, New York, NY, USA, August 15-17, 2011. Proceedings, pages 667-678, 2011. URL: http://dx.doi.org/10.1007/978-3-642-22300-6_57.
41. Barna Saha and Lise Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697-708. SIAM, 2009.
42. Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discrete Math., 8(2):223-250, 1995.
43. Daniel A. Spielman and Shang-Hua Teng. Spectral sparsification of graphs. SIAM J. Comput., 40(4):981-1025, 2011.
44. He Sun. Counting hypergraphs in data streams. CoRR, abs/1304.7456, 2013. URL: http://arxiv.org/abs/1304.7456.
45. Huiwen Yu and Dayu Yuan. Set coverage problems in a one-pass data stream. In SDM, pages 758-766. SIAM, 2013.