Document

**Published in:** LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Given a collection of m sets, each a subset of a universe {1, …, n}, maximum coverage is the problem of choosing k sets whose union has the largest cardinality. A simple greedy algorithm achieves an approximation factor of 1 - 1 / e ≈ 0.632, which is the best possible polynomial-time approximation unless P = NP.
In the streaming setting, information about the input is revealed gradually, in an online fashion. In the set-streaming model, each set is listed contiguously in the stream. In the more general edge-streaming model, the stream is composed of set-element pairs, denoting membership. The overall goal in the streaming setting is to design algorithms that use sublinear space in the size of the input. An interesting line of research is to design algorithms with space complexity polylogarithmic in the size of the input (i.e., polylogarithmic in both n and m); we call such algorithms low-space. In the set-streaming model, it is known that 1/2 is the best possible low-space approximation. In the edge-streaming model, no low-space algorithm can achieve a nontrivial approximation factor.
We study the problem under the assumption that the order in which the stream arrives is chosen uniformly at random. Our main results are as follows.
- In the random-arrival set-streaming model, we give two new algorithms to show that low space is sufficient to break the 1/2 barrier. The first achieves an approximation factor of 1/2 + c₁ using Õ(k²) space, where c₁ > 0 is a small constant and Õ(⋅) notation suppresses polylogarithmic factors; the second achieves a factor of 1 - 1 / e - ε - o(1) using Õ(k² ε^{-3}) space, where the o(1) term is a function of k. This is essentially the optimal bound, as breaking the 1-1/e barrier is known to require high space.
- In the random-arrival edge-streaming model, we show for all fixed α > 0 and δ > 0, any algorithm that α-approximates maximum coverage with probability at least 0.9 in the random-arrival edge-streaming model requires Ω(m^{1-δ}) space (i.e., high space), even for the special case of k = 1.

Rowan Warneke, Farhana Choudhury, and Anthony Wirth. Maximum Coverage in Random-Arrival Streams. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 102:1-102:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{warneke_et_al:LIPIcs.ESA.2023.102, author = {Warneke, Rowan and Choudhury, Farhana and Wirth, Anthony}, title = {{Maximum Coverage in Random-Arrival Streams}}, booktitle = {31st Annual European Symposium on Algorithms (ESA 2023)}, pages = {102:1--102:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-295-2}, ISSN = {1868-8969}, year = {2023}, volume = {274}, editor = {G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.102}, URN = {urn:nbn:de:0030-drops-187559}, doi = {10.4230/LIPIcs.ESA.2023.102}, annote = {Keywords: Maximum Coverage, Streaming Algorithm, Random Arrival, Greedy Algorithm, Communication Complexity} }

Document

**Published in:** LIPIcs, Volume 265, 21st International Symposium on Experimental Algorithms (SEA 2023)

Given a collection of m sets from a universe 𝒰, the Maximum Set Coverage problem consists of finding k sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor 1-1/e. However, this algorithm does not scale well with the input size.
In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe n = |𝒰|. However, one randomized streaming algorithm has been shown to produce a 1-1/e-ε approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to m and n. In order to achieve such a low space complexity, the authors used two techniques in their multi-pass approach:
- F₀-sketching, allows to determine with great accuracy the number of distinct elements in a set using less space than the set itself.
- Subsampling, consists of only solving the problem on a subspace of the universe. It is implemented using γ-independent hash functions.
This article focuses on the sublinear-space algorithm and highlights the time cost of these two techniques, especially subsampling. We present optimizations that significantly reduce the time complexity of the algorithm. Firstly, we give some optimizations that do not alter the space complexity, number of passes and approximation quality of the original algorithm. In particular, we reanalyze the error bounds to show that the original independence factor of Ω(ε^{-2} k log m) can be fine-tuned to Ω(k log m); we also show how F₀-sketching can be removed. Secondly, we derive a new lower bound for the probability of producing a 1-1/e-ε approximation using only pairwise independence: 1- (4/(c k log m)) compared to 1-(2e/(m^{ck/6})) with Ω(k log m)-independence.
Although the theoretical guarantees are weaker, suggesting the approximation quality would suffer, for large streams, our algorithms perform well in practice. Finally, our experimental results show that even a pairwise-independent hash-function sampler does not produce worse solution than the original algorithm, while running significantly faster by several orders of magnitude.

Stephen Jaud, Anthony Wirth, and Farhana Choudhury. Maximum Coverage in Sublinear Space, Faster. In 21st International Symposium on Experimental Algorithms (SEA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 265, pp. 21:1-21:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{jaud_et_al:LIPIcs.SEA.2023.21, author = {Jaud, Stephen and Wirth, Anthony and Choudhury, Farhana}, title = {{Maximum Coverage in Sublinear Space, Faster}}, booktitle = {21st International Symposium on Experimental Algorithms (SEA 2023)}, pages = {21:1--21:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-279-2}, ISSN = {1868-8969}, year = {2023}, volume = {265}, editor = {Georgiadis, Loukas}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2023.21}, URN = {urn:nbn:de:0030-drops-183715}, doi = {10.4230/LIPIcs.SEA.2023.21}, annote = {Keywords: streaming algorithms, subsampling, maximum set cover, k-wise independent hash functions} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail