Search Results

Documents authored by Vu, Hoa T.


Document
Maximum Coverage in the Data Stream Model: Parameterized and Generalized

Authors: Andrew McGregor, David Tench, and Hoa T. Vu

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
We present algorithms for the Max Coverage and Max Unique Coverage problems in the data stream model. The input to both problems are m subsets of a universe of size n and a value k ∈ [m]. In Max Coverage, the problem is to find a collection of at most k sets such that the number of elements covered by at least one set is maximized. In Max Unique Coverage, the problem is to find a collection of at most k sets such that the number of elements covered by exactly one set is maximized. These problems are closely related to a range of graph problems including matching, partial vertex cover, and capacitated maximum cut. In the data stream model, we assume k is given and the sets are revealed online. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are: - If the sets have size at most d, there exist single-pass algorithms using O(d^{d+1} k^d) space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant d. - If each element appears in at most r sets, we present single pass algorithms using Õ(k² r/ε³) space that return a 1+ε approximation in the case of Max Coverage. We also present a single-pass algorithm using slightly more memory, i.e., Õ(k³ r/ε⁴) space, that 1+ε approximates Max Unique Coverage. In contrast to the above results, when d and r are arbitrary, any constant pass 1+ε approximation algorithm for either problem requires Ω(ε^{-2}m) space but a single pass O(ε^{-2}mk) space algorithm exists. In fact any constant-pass algorithm with an approximation better than e/(e-1) and e^{1-1/k} for Max Coverage and Max Unique Coverage respectively requires Ω(m/k²) space when d and r are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set Cover problem.

Cite as

Andrew McGregor, David Tench, and Hoa T. Vu. Maximum Coverage in the Data Stream Model: Parameterized and Generalized. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:LIPIcs.ICDT.2021.12,
  author =	{McGregor, Andrew and Tench, David and Vu, Hoa T.},
  title =	{{Maximum Coverage in the Data Stream Model: Parameterized and Generalized}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.12},
  URN =		{urn:nbn:de:0030-drops-137208},
  doi =		{10.4230/LIPIcs.ICDT.2021.12},
  annote =	{Keywords: Data streams, maximum coverage, maximum unique coverage, set cover}
}
Document
Distributed Dense Subgraph Detection and Low Outdegree Orientation

Authors: Hsin-Hao Su and Hoa T. Vu

Published in: LIPIcs, Volume 179, 34th International Symposium on Distributed Computing (DISC 2020)


Abstract
The densest subgraph problem, introduced in the 80s by Picard and Queyranne [Networks 1982] as well as Goldberg [Tech. Report 1984], is a classic problem in combinatorial optimization with a wide range of applications. The lowest outdegree orientation problem is known to be its dual problem. We study both the problem of finding dense subgraphs and the problem of computing a low outdegree orientation in the distributed settings. Suppose G = (V,E) is the underlying network as well as the input graph. Let D denote the density of the maximum density subgraph of G. Our main results are as follows. - Given a value D̃ ≤ D and 0 < ε < 1, we show that a subgraph with density at least (1-ε)D̃ can be identified deterministically in O((log n) / ε) rounds in the LOCAL model. We also present a lower bound showing that our result for the LOCAL model is tight up to an O(log n) factor. In the CONGEST~ model, we show that such a subgraph can be identified in O((log³ n) / ε³) rounds with high probability. Our techniques also lead to an O(diameter + (log⁴ n)/ε⁴)-round algorithm that yields a 1-ε approximation to the densest subgraph. This improves upon the previous O(diameter /ε ⋅ log n)-round algorithm by Das Sarma et al. [DISC 2012] that only yields a 1/2-ε approximation. - Given an integer D̃ ≥ D and Ω(1/D̃) < ε < 1/4, we give a deterministic, Õ((log² n) /ε²)-round algorithm in the CONGEST~ model that computes an orientation where the outdegree of every vertex is upper bounded by (1+ε)D̃. Previously, the best deterministic algorithm and randomized algorithm by Harris [FOCS 2019] run in Õ((log⁶ n)/ ε⁴) rounds and Õ((log³ n) /ε³) rounds respectively and only work in the LOCAL model.

Cite as

Hsin-Hao Su and Hoa T. Vu. Distributed Dense Subgraph Detection and Low Outdegree Orientation. In 34th International Symposium on Distributed Computing (DISC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 179, pp. 15:1-15:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{su_et_al:LIPIcs.DISC.2020.15,
  author =	{Su, Hsin-Hao and Vu, Hoa T.},
  title =	{{Distributed Dense Subgraph Detection and Low Outdegree Orientation}},
  booktitle =	{34th International Symposium on Distributed Computing (DISC 2020)},
  pages =	{15:1--15:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-168-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{179},
  editor =	{Attiya, Hagit},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2020.15},
  URN =		{urn:nbn:de:0030-drops-130938},
  doi =		{10.4230/LIPIcs.DISC.2020.15},
  annote =	{Keywords: Distributed Algorithms, Network Algorithms}
}
Document
Distributed Data Summarization in Well-Connected Networks

Authors: Hsin-Hao Su and Hoa T. Vu

Published in: LIPIcs, Volume 146, 33rd International Symposium on Distributed Computing (DISC 2019)


Abstract
We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing sum_{i=1}^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data. In the CONGEST~ model, a simple adaptation from streaming lower bounds shows that it requires Omega~(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes sum_{i=1}^{N} g(f_i) exactly in {tau_{G}} * 2^{O(sqrt{log n})} rounds where {tau_{G}} is the mixing time of G. This also has applications in computing the top k most frequent elements. We demonstrate that there is a high similarity between the GOSSIP~ model and the CONGEST~ model in well-connected graphs. In particular, we show that each round of the GOSSIP~ model can be simulated almost perfectly in O~({tau_{G}}) rounds of the CONGEST~ model. To this end, we develop a new algorithm for the GOSSIP~ model that 1 +/- epsilon approximates the p-th frequency moment F_p = sum_{i=1}^N f_i^p in O~(epsilon^{-2} n^{1-k/p}) rounds , for p >= 2, when the number of distinct elements F_0 is at most O(n^{1/(k-1)}). This result can be translated back to the CONGEST~ model with a factor O~({tau_{G}}) blow-up in the number of rounds.

Cite as

Hsin-Hao Su and Hoa T. Vu. Distributed Data Summarization in Well-Connected Networks. In 33rd International Symposium on Distributed Computing (DISC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 146, pp. 33:1-33:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{su_et_al:LIPIcs.DISC.2019.33,
  author =	{Su, Hsin-Hao and Vu, Hoa T.},
  title =	{{Distributed Data Summarization in Well-Connected Networks}},
  booktitle =	{33rd International Symposium on Distributed Computing (DISC 2019)},
  pages =	{33:1--33:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-126-9},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{146},
  editor =	{Suomela, Jukka},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2019.33},
  URN =		{urn:nbn:de:0030-drops-113400},
  doi =		{10.4230/LIPIcs.DISC.2019.33},
  annote =	{Keywords: Distributed Algorithms, Network Algorithms, Data Summarization}
}
Document
Better Streaming Algorithms for the Maximum Coverage Problem

Authors: Andrew McGregor and Hoa T. Vu

Published in: LIPIcs, Volume 68, 20th International Conference on Database Theory (ICDT 2017)


Abstract
We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe {1,...,n}, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor 1-1/e in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to 1-1/e, that use sublinear space o(mn). Our main results are: 1) Two (1-1/e-epsilon) approximation algorithms: One uses O(1/epsilon) passes and O(k/epsilon^2 polylog(m,n)) space whereas the other uses only a single pass but O(m/epsilon^2 polylog(m,n)) space. 2) We show that any approximation factor better than (1-(1-1/k)^k) in constant passes require space that is linear in m for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, (1-epsilon) approximation algorithm using O(m/epsilon^2 min(k,1/epsilon) polylog(m,n)) space. We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires space that is linear in N for constant k whereas O(N/epsilon^2 polylog(m,n)) space is sufficient for a (1-epsilon) approximation and arbitrary k in a single pass. For regular graphs, we show that O(k/epsilon^3 polylog(m,n)) space is sufficient for a (1-epsilon) approximation in a single pass. We generalize this to a K-epsilon approximation when the ratio between the minimum and maximum degree is bounded below by K.

Cite as

Andrew McGregor and Hoa T. Vu. Better Streaming Algorithms for the Maximum Coverage Problem. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:LIPIcs.ICDT.2017.22,
  author =	{McGregor, Andrew and Vu, Hoa T.},
  title =	{{Better Streaming Algorithms for the Maximum Coverage Problem}},
  booktitle =	{20th International Conference on Database Theory (ICDT 2017)},
  pages =	{22:1--22:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-024-8},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{68},
  editor =	{Benedikt, Michael and Orsi, Giorgio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2017.22},
  URN =		{urn:nbn:de:0030-drops-70620},
  doi =		{10.4230/LIPIcs.ICDT.2017.22},
  annote =	{Keywords: algorithms, data streams, approximation, maximum coverage}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail