Maximum Coverage in the Data Stream Model: Parameterized and Generalized

Authors Andrew McGregor, David Tench, Hoa T. Vu



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2021.12.pdf
  • Filesize: 0.81 MB
  • 20 pages

Document Identifiers

Author Details

Andrew McGregor
  • University of Massachusetts Amherst, MA, USA
David Tench
  • Stony Brook University, NY, USA
Hoa T. Vu
  • San Diego State University, CA, USA

Cite AsGet BibTex

Andrew McGregor, David Tench, and Hoa T. Vu. Maximum Coverage in the Data Stream Model: Parameterized and Generalized. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ICDT.2021.12

Abstract

We present algorithms for the Max Coverage and Max Unique Coverage problems in the data stream model. The input to both problems are m subsets of a universe of size n and a value k ∈ [m]. In Max Coverage, the problem is to find a collection of at most k sets such that the number of elements covered by at least one set is maximized. In Max Unique Coverage, the problem is to find a collection of at most k sets such that the number of elements covered by exactly one set is maximized. These problems are closely related to a range of graph problems including matching, partial vertex cover, and capacitated maximum cut. In the data stream model, we assume k is given and the sets are revealed online. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are: - If the sets have size at most d, there exist single-pass algorithms using O(d^{d+1} k^d) space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant d. - If each element appears in at most r sets, we present single pass algorithms using Õ(k² r/ε³) space that return a 1+ε approximation in the case of Max Coverage. We also present a single-pass algorithm using slightly more memory, i.e., Õ(k³ r/ε⁴) space, that 1+ε approximates Max Unique Coverage. In contrast to the above results, when d and r are arbitrary, any constant pass 1+ε approximation algorithm for either problem requires Ω(ε^{-2}m) space but a single pass O(ε^{-2}mk) space algorithm exists. In fact any constant-pass algorithm with an approximation better than e/(e-1) and e^{1-1/k} for Max Coverage and Max Unique Coverage respectively requires Ω(m/k²) space when d and r are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set Cover problem.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sketching and sampling
  • Theory of computation → Approximation algorithms analysis
  • Theory of computation → Parameterized complexity and exact algorithms
Keywords
  • Data streams
  • maximum coverage
  • maximum unique coverage
  • set cover

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alexander A. Ageev and Maxim Sviridenko. Pipage rounding: A new method of constructing algorithms with proven performance guarantee. J. Comb. Optim., 8(3):307-328, 2004. Google Scholar
  2. Shipra Agrawal, Mohammad Shadravan, and Cliff Stein. Submodular secretary problem with shortlists. CoRR, abs/1809.05082, 2018. URL: http://arxiv.org/abs/1809.05082.
  3. Kook Jin Ahn and Sudipto Guha. Linear programming in the semi-streaming model with application to the maximum matching problem. Inf. Comput., 222:59-79, 2013. URL: https://doi.org/10.1016/j.ic.2012.10.006.
  4. Naor Alaluf, Alina Ene, Moran Feldman, Huy L. Nguyen, and Andrew Suh. Optimal streaming algorithms for submodular maximization with cardinality constraints. In ICALP, volume 168 of LIPIcs, pages 6:1-6:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. Google Scholar
  5. Aris Anagnostopoulos, Luca Becchetti, Ilaria Bordino, Stefano Leonardi, Ida Mele, and Piotr Sankowski. Stochastic query covering for fast approximate document retrieval. ACM Trans. Inf. Syst., 33(3):11:1-11:35, 2015. Google Scholar
  6. Sepehr Assadi. Tight space-approximation tradeoff for the multi-pass streaming set cover problem. In PODS, pages 321-335. ACM, 2017. Google Scholar
  7. Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streaming complexity of the set cover problem. In STOC, pages 698-711. ACM, 2016. Google Scholar
  8. Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming submodular maximization: massive data summarization on the fly. In KDD, pages 671-680. ACM, 2014. Google Scholar
  9. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In RANDOM, volume 2483 of Lecture Notes in Computer Science, pages 1-10. Springer, 2002. Google Scholar
  10. Édouard Bonnet, Vangelis Th. Paschos, and Florian Sikora. Parameterized exact and approximation algorithms for maximum k-set cover and related satisfiability problems. RAIRO Theor. Informatics Appl., 50(3):227-240, 2016. Google Scholar
  11. Vladimir Braverman, Rafail Ostrovsky, and Dan Vilenchik. How hard is counting triangles in the streaming model? In ICALP (1), volume 7965 of Lecture Notes in Computer Science, pages 244-254. Springer, 2013. Google Scholar
  12. Marc Bury and Chris Schwiegelshohn. Sublinear estimation of weighted matchings in dynamic data streams. In Algorithms - ESA 2015 - 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings, pages 263-274, 2015. URL: https://doi.org/10.1007/978-3-662-48350-3_23.
  13. Amit Chakrabarti and Sagar Kale. Submodular maximization meets streaming: matchings, matroids, and more. Math. Program., 154(1-2):225-247, 2015. Google Scholar
  14. Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In IEEE Conference on Computational Complexity, pages 107-117. IEEE Computer Society, 2003. Google Scholar
  15. Amit Chakrabarti and Anthony Wirth. Incidence geometries and the pass complexity of semi-streaming set cover. In SODA, pages 1365-1373. SIAM, 2016. Google Scholar
  16. Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. Streaming algorithms for submodular function maximization. In ICALP (1), volume 9134 of Lecture Notes in Computer Science, pages 318-330. Springer, 2015. Google Scholar
  17. Rajesh Chitnis and Graham Cormode. Towards a theory of parameterized streaming algorithms. In 14th International Symposium on Parameterized and Exact Computation, IPEC 2019, September 11-13, 2019, Munich, Germany, pages 7:1-7:15, 2019. URL: https://doi.org/10.4230/LIPIcs.IPEC.2019.7.
  18. Rajesh Chitnis, Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi, Andrew McGregor, Morteza Monemizadeh, and Sofya Vorotnikova. Kernelization via sampling with applications to finding matchings and related problems in dynamic graph streams. In SODA, pages 1326-1344. SIAM, 2016. Google Scholar
  19. Rajesh Hemant Chitnis, Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi, and Morteza Monemizadeh. Brief announcement: New streaming algorithms for parameterized maximal matching & beyond. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2015, Portland, OR, USA, June 13-15, 2015, pages 56-58, 2015. URL: https://doi.org/10.1145/2755573.2755618.
  20. Rajesh Hemant Chitnis, Graham Cormode, Mohammad Taghi Hajiaghayi, and Morteza Monemizadeh. Parameterized streaming: Maximal matching and vertex cover. In SODA, pages 1234-1251. SIAM, 2015. Google Scholar
  21. Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). IEEE Trans. Knowl. Data Eng., 15(3):529-540, 2003. Google Scholar
  22. Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58-75, 2005. URL: https://doi.org/10.1016/j.jalgor.2003.12.001.
  23. Michael Crouch and Daniel S. Stubbs. Improved streaming algorithms for weighted matching, via unweighted matching. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2014, September 4-6, 2014, Barcelona, Spain, pages 96-104, 2014. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2014.96.
  24. Michael S. Crouch, Andrew McGregor, and Daniel Stubbs. Dynamic graphs in the sliding-window model. In Algorithms - ESA 2013 - 21st Annual European Symposium, Sophia Antipolis, France, September 2-4, 2013. Proceedings, pages 337-348, 2013. URL: https://doi.org/10.1007/978-3-642-40450-4_29.
  25. Erik D. Demaine, Uriel Feige, MohammadTaghi Hajiaghayi, and Mohammad R. Salavatipour. Combination can be hard: Approximability of the unique coverage problem. SIAM J. Comput., 38(4):1464-1483, 2008. Google Scholar
  26. Michael Dom, Jiong Guo, Rolf Niedermeier, and Sebastian Wernicke. Minimum membership set covering and the consecutive ones property. In SWAT, volume 4059 of Lecture Notes in Computer Science, pages 339-350. Springer, 2006. Google Scholar
  27. Yuval Emek and Adi Rosén. Semi-streaming set cover. ACM Trans. Algorithms, 13(1):6:1-6:22, 2016. Google Scholar
  28. Leah Epstein, Asaf Levin, Julián Mestre, and Danny Segev. Improved approximation guarantees for weighted matching in the semi-streaming model. SIAM J. Discrete Math., 25(3):1251-1265, 2011. URL: https://doi.org/10.1137/100801901.
  29. Thomas Erlebach and Erik Jan van Leeuwen. Approximating geometric coverage problems. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008, pages 1267-1276, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347220.
  30. Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634-652, 1998. Google Scholar
  31. Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2):207-216, 2005. URL: https://doi.org/10.1016/j.tcs.2005.09.013.
  32. Moran Feldman, Ashkan Norouzi-Fard, Ola Svensson, and Rico Zenklusen. The one-way communication complexity of submodular maximization with applications to streaming and robustness. In STOC, pages 1363-1374. ACM, 2020. Google Scholar
  33. Daya Ram Gaur, Ramesh Krishnamurti, and Rajeev Kohli. Erratum to: The capacitated max k-cut problem. Math. Program., 126(1):191, 2011. Google Scholar
  34. Ashish Goel, Michael Kapralov, and Sanjeev Khanna. On the communication and streaming complexity of maximum bipartite matching. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 468-485, 2012. URL: http://portal.acm.org/citation.cfm?id=2095157&CFID=63838676&CFTOKEN=79617016, URL: https://doi.org/10.1137/1.9781611973099.41.
  35. Venkatesan Guruswami and Krzysztof Onak. Superlinear lower bounds for multipass graph processing. In Proceedings of the 28th Conference on Computational Complexity, CCC 2013, Palo Alto, California, USA, 5-7 June, 2013, pages 287-298, 2013. URL: https://doi.org/10.1109/CCC.2013.37.
  36. Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, and Ali Vakilian. Towards tight bounds for the streaming set cover problem. In PODS, pages 371-383. ACM, 2016. Google Scholar
  37. Chien-Chung Huang, Naonori Kakimura, and Yuichi Yoshida. Streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. In APPROX-RANDOM, volume 81 of LIPIcs, pages 11:1-11:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  38. Piotr Indyk, Sepideh Mahabadi, Ronitt Rubinfeld, Jonathan Ullman, Ali Vakilian, and Anak Yodpinyanee. Fractional set cover in the streaming model. In APPROX-RANDOM, volume 81 of LIPIcs, pages 12:1-12:20. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  39. Piotr Indyk and Ali Vakilian. Tight trade-offs for the maximum k-coverage problem in the general streaming model. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 200-217, 2019. URL: https://doi.org/10.1145/3294052.3319691.
  40. Takehiro Ito, Shin-Ichi Nakano, Yoshio Okamoto, Yota Otachi, Ryuhei Uehara, Takeaki Uno, and Yushi Uno. A 4.31-approximation for the geometric unique coverage problem on unit disks. Theor. Comput. Sci., 544:14-31, 2014. Google Scholar
  41. John Kallaugher, Andrew McGregor, Eric Price, and Sofya Vorotnikova. The complexity of counting cycles in the adjacency list streaming model. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 119-133, 2019. URL: https://doi.org/10.1145/3294052.3319706.
  42. Michael Kapralov. Better bounds for matchings in the streaming model. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 1679-1697, 2013. URL: https://doi.org/10.1137/1.9781611973105.121.
  43. Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Approximating matching size from random streams. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 734-751, 2014. URL: https://doi.org/10.1137/1.9781611973402.55.
  44. Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Streaming lower bounds for approximating MAX-CUT. In SODA, pages 1263-1282. SIAM, 2015. Google Scholar
  45. Michael Kapralov, Sanjeev Khanna, Madhu Sudan, and Ameya Velingker. (1 + ω(1))-approximation to MAX-CUT requires linear space. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 1703-1722, 2017. URL: https://doi.org/10.1137/1.9781611974782.112.
  46. Michael Kapralov and Dmitry Krachun. An optimal space lower bound for approximating MAX-CUT. CoRR, abs/1811.10879, 2018. URL: http://arxiv.org/abs/1811.10879.
  47. David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11:105-147, 2015. Google Scholar
  48. Christian Konrad. Maximum matching in turnstile streams. In Algorithms - ESA 2015 - 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings, pages 840-852, 2015. URL: https://doi.org/10.1007/978-3-662-48350-3_70.
  49. Christian Konrad, Frédéric Magniez, and Claire Mathieu. Maximum matching in semi-streaming with few passes. In APPROX-RANDOM, volume 7408 of Lecture Notes in Computer Science, pages 231-242. Springer, 2012. Google Scholar
  50. Christian Konrad and Adi Rosén. Approximating semi-matchings in streaming and in two-party communication. In Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, Riga, Latvia, July 8-12, 2013, Proceedings, Part I, pages 637-649, 2013. URL: https://doi.org/10.1007/978-3-642-39206-1_54.
  51. Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular functions. In AAAI, pages 1650-1654. AAAI Press, 2007. Google Scholar
  52. Fabian Kuhn, Pascal von Rickenbach, Roger Wattenhofer, Emo Welzl, and Aaron Zollinger. Interference in cellular networks: The minimum membership set cover problem. In COCOON, volume 3595 of Lecture Notes in Computer Science, pages 188-198. Springer, 2005. Google Scholar
  53. Pasin Manurangsi. A note on max k-vertex cover: Faster fpt-as, smaller approximate kernel and improved approximation. In 2nd Symposium on Simplicity in Algorithms, SOSA@SODA 2019, January 8-9, 2019 - San Diego, CA, USA, pages 15:1-15:21, 2019. URL: https://doi.org/10.4230/OASIcs.SOSA.2019.15.
  54. Andrew McGregor. Finding graph matchings in data streams. APPROX-RANDOM, pages 170-181, 2005. Google Scholar
  55. Andrew McGregor. Graph stream algorithms: a survey. SIGMOD Record, 43(1):9-20, 2014. Google Scholar
  56. Andrew McGregor and Sofya Vorotnikova. Planar matching in streams revisited. In APPROX-RANDOM, volume 60 of LIPIcs, pages 17:1-17:12. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016. Google Scholar
  57. Andrew McGregor and Sofya Vorotnikova. Triangle and four cycle counting in the data stream model. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020, pages 445-456, 2020. URL: https://doi.org/10.1145/3375395.3387652.
  58. Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. Better algorithms for counting triangles in data streams. In PODS, pages 401-411. ACM, 2016. Google Scholar
  59. Andrew McGregor and Hoa T. Vu. Better streaming algorithms for the maximum coverage problem. In ICDT, volume 68 of LIPIcs, pages 22:1-22:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  60. Andrew McGregor and Hoa T. Vu. Better streaming algorithms for the maximum coverage problem. Theory of Computing Systems, pages 1-25, 2018. Google Scholar
  61. Neeldhara Misra, Hannes Moser, Venkatesh Raman, Saket Saurabh, and Somnath Sikdar. The parameterized complexity of unique coverage and its variants. Algorithmica, 65(3):517-544, 2013. URL: https://doi.org/10.1007/s00453-011-9608-0.
  62. Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrovic, Amir Zandieh, Aidasadat Mousavifar, and Ola Svensson. Beyond 1/2-approximation for submodular maximization on massive data streams. In ICML, volume 80 of Proceedings of Machine Learning Research, pages 3826-3835. PMLR, 2018. Google Scholar
  63. Barna Saha and Lise Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697-708. SIAM, 2009. Google Scholar
  64. Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discrete Math., 8(2):223-250, 1995. Google Scholar
  65. Mariano Zelke. Weighted matching in the semi-streaming model. Algorithmica, 62(1-2):1-20, 2012. URL: https://doi.org/10.1007/s00453-010-9438-5.