Document

**Published in:** LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

We study the problem of estimating the number of edges in an n-vertex graph, accessed via the Bipartite Independent Set query model introduced by Beame et al. (TALG '20). In this model, each query returns a Boolean, indicating the existence of at least one edge between two specified sets of nodes. We present a non-adaptive algorithm that returns a (1± ε) relative error approximation to the number of edges, with query complexity Õ(ε^{-5}log⁵ n), where Õ(⋅) hides poly(log log n) dependencies. This is the first non-adaptive algorithm in this setting achieving poly(1/ε,log n) query complexity. Prior work requires Ω(log² n) rounds of adaptivity. We avoid this by taking a fundamentally different approach, inspired by work on single-pass streaming algorithms. Moreover, for constant ε, our query complexity significantly improves on the best known adaptive algorithm due to Bhattacharya et al. (STACS '22), which requires O(ε^{-2} log^{11} n) queries. Building on our edge estimation result, we give the first {non-adaptive} algorithm for outputting a nearly uniformly sampled edge with query complexity Õ(ε^{-6} log⁶ n), improving on the works of Dell et al. (SODA '20) and Bhattacharya et al. (STACS '22), which require Ω(log³ n) rounds of adaptivity. Finally, as a consequence of our edge sampling algorithm, we obtain a Õ(n log^8 n) query algorithm for connectivity, using two rounds of adaptivity. This improves on a three-round algorithm of Assadi et al. (ESA '21) and is tight; there is no non-adaptive algorithm for connectivity making o(n²) queries.

Raghavendra Addanki, Andrew McGregor, and Cameron Musco. Non-Adaptive Edge Counting and Sampling via Bipartite Independent Set Queries. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 2:1-2:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{addanki_et_al:LIPIcs.ESA.2022.2, author = {Addanki, Raghavendra and McGregor, Andrew and Musco, Cameron}, title = {{Non-Adaptive Edge Counting and Sampling via Bipartite Independent Set Queries}}, booktitle = {30th Annual European Symposium on Algorithms (ESA 2022)}, pages = {2:1--2:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-247-1}, ISSN = {1868-8969}, year = {2022}, volume = {244}, editor = {Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.2}, URN = {urn:nbn:de:0030-drops-169400}, doi = {10.4230/LIPIcs.ESA.2022.2}, annote = {Keywords: sublinear graph algorithms, bipartite independent set queries, edge sampling and counting, graph connectivity, query adaptivity} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

We consider the problem of reconstructing a graph G in two natural sampling models: 1) each sample corresponds to a random induced subgraph and 2) for a fixed adjacency matrix A_G for G, each sample corresponds to a random principal submatrix (i.e., a submatrix formed by deleting the same set of rows and columns) of A_G. We refer to these models as the "unordered" and "ordered" models respectively. The two models are motivated by work on the reconstruction conjecture in combinatorics and trace reconstruction in theoretical computer science. Despite the superficial similarities between the models, we show that the sample complexity of reconstruction can be exponentially different. Our main results are as follows:
- In the unordered model, we show that almost all graphs can be reconstructed with Θ(p^{-2} log n) samples if each node is included in the random subgraph with any constant probability p; this is optimal. We show our upper bound extends to smaller values of p as well. In contrast, for arbitrary graphs, we show that exp(Ω(n)) samples are required for reconstruction even for 2-regular graphs. One of the key technical steps in the first result is showing that, with high probability, any subgraph isomorphism in a random graph has at most O(log n) non-fixed points.
- In the ordered model, we show that any graph with constant arboricity or degeneracy (i.e., every induced subgraph has constant average degree) can be reconstructed with exp(Õ(n^{1/3})) samples and that arbitrary graphs can be reconstructed with exp(Õ(n^{1/2})) samples. The results about almost all graphs in the first model carry over to the second. One of the key technical steps in the first result is showing that reconstruction of low degeneracy graphs can be reduced to learning a small number of moments of sets of the form {i-j: j < i,(i,j) ∈ E} and {j-i: i < j,(i,j) ∈ E} where G = ([n],E) is the unknown graph.

Andrew McGregor and Rik Sengupta. Graph Reconstruction from Random Subgraphs. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 96:1-96:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:LIPIcs.ICALP.2022.96, author = {McGregor, Andrew and Sengupta, Rik}, title = {{Graph Reconstruction from Random Subgraphs}}, booktitle = {49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)}, pages = {96:1--96:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-235-8}, ISSN = {1868-8969}, year = {2022}, volume = {229}, editor = {Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.96}, URN = {urn:nbn:de:0030-drops-164373}, doi = {10.4230/LIPIcs.ICALP.2022.96}, annote = {Keywords: graph reconstruction, sample complexity, deletion channel} }

Document

**Published in:** LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Given an n-point metric space ({𝒳},d) where each point belongs to one of m = O(1) different categories or groups and a set of integers k₁, …, k_m, the fair Max-Min diversification problem is to select k_i points belonging to category i ∈ [m], such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results:
1) We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor 2-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a 6-approximation that is guaranteed to satisfy the fairness constraints up to a factor 1-ε for any constant ε. We also present a linear time algorithm returning an m+1 approximation with exact fairness. The best previous result was a 3m-1 approximation.
2) We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. {For constant dimensions, categories and any constant ε > 0, we present a 1+ε approximation algorithm that runs in O(nk) + 2^{O(k)} time where k = k₁+…+k_m.} We can improve the running time to O(nk)+poly(k) at the expense of only picking (1-ε) k_i points from category i ∈ [m].
Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.

Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, and Zafeiria Moumoulidou. Improved Approximation and Scalability for Fair Max-Min Diversification. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 7:1-7:21, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{addanki_et_al:LIPIcs.ICDT.2022.7, author = {Addanki, Raghavendra and McGregor, Andrew and Meliou, Alexandra and Moumoulidou, Zafeiria}, title = {{Improved Approximation and Scalability for Fair Max-Min Diversification}}, booktitle = {25th International Conference on Database Theory (ICDT 2022)}, pages = {7:1--7:21}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-223-5}, ISSN = {1868-8969}, year = {2022}, volume = {220}, editor = {Olteanu, Dan and Vortmeier, Nils}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.7}, URN = {urn:nbn:de:0030-drops-158812}, doi = {10.4230/LIPIcs.ICDT.2022.7}, annote = {Keywords: algorithmic fairness, diversity maximization, data selection, approximation algorithms} }

Document

**Published in:** LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

We present algorithms for the Max Coverage and Max Unique Coverage problems in the data stream model. The input to both problems are m subsets of a universe of size n and a value k ∈ [m]. In Max Coverage, the problem is to find a collection of at most k sets such that the number of elements covered by at least one set is maximized. In Max Unique Coverage, the problem is to find a collection of at most k sets such that the number of elements covered by exactly one set is maximized. These problems are closely related to a range of graph problems including matching, partial vertex cover, and capacitated maximum cut. In the data stream model, we assume k is given and the sets are revealed online. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are:
- If the sets have size at most d, there exist single-pass algorithms using O(d^{d+1} k^d) space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant d.
- If each element appears in at most r sets, we present single pass algorithms using Õ(k² r/ε³) space that return a 1+ε approximation in the case of Max Coverage. We also present a single-pass algorithm using slightly more memory, i.e., Õ(k³ r/ε⁴) space, that 1+ε approximates Max Unique Coverage. In contrast to the above results, when d and r are arbitrary, any constant pass 1+ε approximation algorithm for either problem requires Ω(ε^{-2}m) space but a single pass O(ε^{-2}mk) space algorithm exists. In fact any constant-pass algorithm with an approximation better than e/(e-1) and e^{1-1/k} for Max Coverage and Max Unique Coverage respectively requires Ω(m/k²) space when d and r are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set Cover problem.

Andrew McGregor, David Tench, and Hoa T. Vu. Maximum Coverage in the Data Stream Model: Parameterized and Generalized. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 12:1-12:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:LIPIcs.ICDT.2021.12, author = {McGregor, Andrew and Tench, David and Vu, Hoa T.}, title = {{Maximum Coverage in the Data Stream Model: Parameterized and Generalized}}, booktitle = {24th International Conference on Database Theory (ICDT 2021)}, pages = {12:1--12:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-179-5}, ISSN = {1868-8969}, year = {2021}, volume = {186}, editor = {Yi, Ke and Wei, Zhewei}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.12}, URN = {urn:nbn:de:0030-drops-137208}, doi = {10.4230/LIPIcs.ICDT.2021.12}, annote = {Keywords: Data streams, maximum coverage, maximum unique coverage, set cover} }

Document

**Published in:** LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Diversity is an important principle in data selection and summarization, facility location, and recommendation systems. Our work focuses on maximizing diversity in data selection, while offering fairness guarantees. In particular, we offer the first study that augments the Max-Min diversification objective with fairness constraints. More specifically, given a universe 𝒰 of n elements that can be partitioned into m disjoint groups, we aim to retrieve a k-sized subset that maximizes the pairwise minimum distance within the set (diversity) and contains a pre-specified k_i number of elements from each group i (fairness). We show that this problem is NP-complete even in metric spaces, and we propose three novel algorithms, linear in n, that provide strong theoretical approximation guarantees for different values of m and k. Finally, we extend our algorithms and analysis to the case where groups can be overlapping.

Zafeiria Moumoulidou, Andrew McGregor, and Alexandra Meliou. Diverse Data Selection under Fairness Constraints. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 13:1-13:25, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{moumoulidou_et_al:LIPIcs.ICDT.2021.13, author = {Moumoulidou, Zafeiria and McGregor, Andrew and Meliou, Alexandra}, title = {{Diverse Data Selection under Fairness Constraints}}, booktitle = {24th International Conference on Database Theory (ICDT 2021)}, pages = {13:1--13:25}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-179-5}, ISSN = {1868-8969}, year = {2021}, volume = {186}, editor = {Yi, Ke and Wei, Zhewei}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.13}, URN = {urn:nbn:de:0030-drops-137216}, doi = {10.4230/LIPIcs.ICDT.2021.13}, annote = {Keywords: data selection, diversity maximization, fairness constraints, approximation algorithms} }

Document

**Published in:** LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

In the beautifully simple-to-state problem of trace reconstruction, the goal is to reconstruct an unknown binary string x given random "traces" of x where each trace is generated by deleting each coordinate of x independently with probability p<1. The problem is well studied both when the unknown string is arbitrary and when it is chosen uniformly at random. For both settings, there is still an exponential gap between upper and lower sample complexity bounds and our understanding of the problem is still surprisingly limited. In this paper, we consider natural parameterizations and generalizations of this problem in an effort to attain a deeper and more comprehensive understanding. Perhaps our most surprising results are:
1) We prove that exp(O(n^(1/4) sqrt{log n})) traces suffice for reconstructing arbitrary matrices. In the matrix version of the problem, each row and column of an unknown sqrt{n} x sqrt{n} matrix is deleted independently with probability p. Our results contrasts with the best known results for sequence reconstruction where the best known upper bound is exp(O(n^(1/3))).
2) An optimal result for random matrix reconstruction: we show that Theta(log n) traces are necessary and sufficient. This is in contrast to the problem for random sequences where there is a super-logarithmic lower bound and the best known upper bound is exp({O}(log^(1/3) n)).
3) We show that exp(O(k^(1/3) log^(2/3) n)) traces suffice to reconstruct k-sparse strings, providing an improvement over the best known sequence reconstruction results when k = o(n/log^2 n).
4) We show that poly(n) traces suffice if x is k-sparse and we additionally have a "separation" promise, specifically that the indices of 1’s in x all differ by Omega(k log n).

Akshay Krishnamurthy, Arya Mazumdar, Andrew McGregor, and Soumyabrata Pal. Trace Reconstruction: Generalized and Parameterized. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 68:1-68:25, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{krishnamurthy_et_al:LIPIcs.ESA.2019.68, author = {Krishnamurthy, Akshay and Mazumdar, Arya and McGregor, Andrew and Pal, Soumyabrata}, title = {{Trace Reconstruction: Generalized and Parameterized}}, booktitle = {27th Annual European Symposium on Algorithms (ESA 2019)}, pages = {68:1--68:25}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-124-5}, ISSN = {1868-8969}, year = {2019}, volume = {144}, editor = {Bender, Michael A. and Svensson, Ola and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.68}, URN = {urn:nbn:de:0030-drops-111891}, doi = {10.4230/LIPIcs.ESA.2019.68}, annote = {Keywords: deletion channel, trace reconstruction, matrix reconstruction} }

Document

**Published in:** OASIcs, Volume 61, 1st Symposium on Simplicity in Algorithms (SOSA 2018)

We present a simple single-pass data stream algorithm using O((log n)/eps^2) space that returns an (alpha + 2)(1 + eps) approximation to the size of the maximum matching in a graph of arboricity alpha.

Andrew McGregor and Sofya Vorotnikova. A Simple, Space-Efficient, Streaming Algorithm for Matchings in Low Arboricity Graphs. In 1st Symposium on Simplicity in Algorithms (SOSA 2018). Open Access Series in Informatics (OASIcs), Volume 61, pp. 14:1-14:4, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:OASIcs.SOSA.2018.14, author = {McGregor, Andrew and Vorotnikova, Sofya}, title = {{A Simple, Space-Efficient, Streaming Algorithm for Matchings in Low Arboricity Graphs}}, booktitle = {1st Symposium on Simplicity in Algorithms (SOSA 2018)}, pages = {14:1--14:4}, series = {Open Access Series in Informatics (OASIcs)}, ISBN = {978-3-95977-064-4}, ISSN = {2190-6807}, year = {2018}, volume = {61}, editor = {Seidel, Raimund}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SOSA.2018.14}, URN = {urn:nbn:de:0030-drops-82959}, doi = {10.4230/OASIcs.SOSA.2018.14}, annote = {Keywords: data streams, matching, planar graphs, arboricity} }

Document

**Published in:** LIPIcs, Volume 68, 20th International Conference on Database Theory (ICDT 2017)

We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe {1,...,n}, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor 1-1/e in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to 1-1/e, that use sublinear space o(mn). Our main results are: 1) Two (1-1/e-epsilon) approximation algorithms: One uses O(1/epsilon) passes and O(k/epsilon^2 polylog(m,n)) space whereas the other uses only a single pass but O(m/epsilon^2 polylog(m,n)) space. 2) We show that any approximation factor better than (1-(1-1/k)^k) in constant passes require space that is linear in m for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, (1-epsilon) approximation algorithm using O(m/epsilon^2 min(k,1/epsilon) polylog(m,n)) space.
We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires space that is linear in N for constant k whereas O(N/epsilon^2 polylog(m,n)) space is sufficient for a (1-epsilon) approximation and arbitrary k in a single pass. For regular graphs, we show that O(k/epsilon^3 polylog(m,n)) space is sufficient for a (1-epsilon) approximation in a single pass. We generalize this to a K-epsilon approximation when the ratio between the minimum and maximum degree is bounded below by K.

Andrew McGregor and Hoa T. Vu. Better Streaming Algorithms for the Maximum Coverage Problem. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 22:1-22:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:LIPIcs.ICDT.2017.22, author = {McGregor, Andrew and Vu, Hoa T.}, title = {{Better Streaming Algorithms for the Maximum Coverage Problem}}, booktitle = {20th International Conference on Database Theory (ICDT 2017)}, pages = {22:1--22:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-024-8}, ISSN = {1868-8969}, year = {2017}, volume = {68}, editor = {Benedikt, Michael and Orsi, Giorgio}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2017.22}, URN = {urn:nbn:de:0030-drops-70620}, doi = {10.4230/LIPIcs.ICDT.2017.22}, annote = {Keywords: algorithms, data streams, approximation, maximum coverage} }

Document

**Published in:** LIPIcs, Volume 60, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)

We present data stream algorithms for estimating the size or weight of the maximum matching in low arboricity graphs. A large body of work has focused on improving the constant approximation factor for general graphs when the data stream algorithm is permitted O(n polylog n) space where n is the number of nodes. This space is necessary if the algorithm must return the matching. Recently, Esfandiari et al. (SODA 2015) showed that it was possible to estimate the maximum cardinality of a matching in a planar graph up to a factor of 24+epsilon using O(epsilon^{-2} n^{2/3} polylog n) space. We first present an algorithm (with a simple analysis) that improves this to a factor 5+epsilon using the same space. We also improve upon the previous results for other graphs with bounded arboricity. We then present a factor 12.5 approximation for matching in planar graphs that can be implemented using O(log n) space in the adjacency list data stream model where the stream is a concatenation of the adjacency lists of the graph. The main idea behind our results is finding "local" fractional matchings, i.e., fractional matchings where the value of any edge e is solely determined by the edges sharing an endpoint with e. Our work also improves upon the results for the dynamic data stream model where the stream consists of a sequence of edges being inserted and deleted from the graph. We also extend our results to weighted graphs, improving over the bounds given by Bury and Schwiegelshohn (ESA 2015), via a reduction to the unweighted problem that increases the approximation by at most a factor of two.

Andrew McGregor and Sofya Vorotnikova. Planar Matching in Streams Revisited. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 60, pp. 17:1-17:12, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:LIPIcs.APPROX-RANDOM.2016.17, author = {McGregor, Andrew and Vorotnikova, Sofya}, title = {{Planar Matching in Streams Revisited}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)}, pages = {17:1--17:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-018-7}, ISSN = {1868-8969}, year = {2016}, volume = {60}, editor = {Jansen, Klaus and Mathieu, Claire and Rolim, Jos\'{e} D. P. and Umans, Chris}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2016.17}, URN = {urn:nbn:de:0030-drops-66407}, doi = {10.4230/LIPIcs.APPROX-RANDOM.2016.17}, annote = {Keywords: data streams, planar graphs, arboricity, matchings} }

Document

**Published in:** LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

We address the trade-off between the computational resources needed to process a large data set and the number of samples available from the data set. Specifically, we consider the following abstraction: we receive a potentially infinite stream of IID samples from some unknown distribution D, and are tasked with computing some function f(D). If the stream is observed for time t, how much memory, s, is required to estimate f(D)? We refer to t as the sample complexity and s as the space complexity. The main focus of this paper is investigating the trade-offs between the space and sample complexity. We study these trade-offs for several canonical problems studied in the data stream model: estimating the collision probability, i.e., the second moment of a distribution, deciding if a graph is connected, and approximating the dimension of an unknown subspace. Our results are based on techniques for simulating different classical sampling procedures in this model, emulating random walks given a sequence of IID samples, as well as leveraging a characterization between communication bounded protocols and statistical query algorithms.

Michael Crouch, Andrew McGregor, Gregory Valiant, and David P. Woodruff. Stochastic Streams: Sample Complexity vs. Space Complexity. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 32:1-32:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{crouch_et_al:LIPIcs.ESA.2016.32, author = {Crouch, Michael and McGregor, Andrew and Valiant, Gregory and Woodruff, David P.}, title = {{Stochastic Streams: Sample Complexity vs. Space Complexity}}, booktitle = {24th Annual European Symposium on Algorithms (ESA 2016)}, pages = {32:1--32:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-015-6}, ISSN = {1868-8969}, year = {2016}, volume = {57}, editor = {Sankowski, Piotr and Zaroliagis, Christos}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.32}, URN = {urn:nbn:de:0030-drops-63838}, doi = {10.4230/LIPIcs.ESA.2016.32}, annote = {Keywords: data streams, sample complexity, frequency moments, graph connectivity, subspace approximation} }

Document

**Published in:** LIPIcs, Volume 33, 30th Conference on Computational Complexity (CCC 2015)

In the setting of streaming interactive proofs (SIPs), a client (verifier) needs to compute a given function on a massive stream of data, arriving online, but is unable to store even a small fraction of the data. It outsources the processing to a third party service (prover), but is unwilling to blindly trust answers returned by this service. Thus, the service cannot simply supply the desired answer; it must convince the verifier of its correctness via a short interaction after the stream has been seen.
In this work we study "barely interactive" SIPs. Specifically, we show that two or three rounds of interaction suffice to solve several query problems - including Index, Median, Nearest Neighbor Search, Pattern Matching, and Range Counting - with polylogarithmic space and communication costs. Such efficiency with O(1) rounds of interaction was thought to be impossible based on previous work.
On the other hand, we initiate a formal study of the limitations of constant-round SIPs by introducing a new hierarchy of communication models called Online Interactive Proofs (OIPs). The online nature of these models is analogous to the streaming restriction placed upon the verifier in an SIP. We give upper and lower bounds that (1) characterize, up to quadratic blowups, every finite level of the OIP hierarchy in terms of other well-known communication complexity classes, (2) separate the first four levels of the hierarchy, and (3) reveal that the hierarchy collapses to the fourth level. Our study of OIPs reveals marked contrasts and some parallels with the classic Turing Machine theory of interactive proofs, establishes limits on the power of existing techniques for developing constant-round SIPs, and provides a new characterization of (non-online) Arthur-Merlin communication in terms of an online model.

Amit Chakrabarti, Graham Cormode, Andrew McGregor, Justin Thaler, and Suresh Venkatasubramanian. Verifiable Stream Computation and Arthur–Merlin Communication. In 30th Conference on Computational Complexity (CCC 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 33, pp. 217-243, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{chakrabarti_et_al:LIPIcs.CCC.2015.217, author = {Chakrabarti, Amit and Cormode, Graham and McGregor, Andrew and Thaler, Justin and Venkatasubramanian, Suresh}, title = {{Verifiable Stream Computation and Arthur–Merlin Communication}}, booktitle = {30th Conference on Computational Complexity (CCC 2015)}, pages = {217--243}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-81-1}, ISSN = {1868-8969}, year = {2015}, volume = {33}, editor = {Zuckerman, David}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2015.217}, URN = {urn:nbn:de:0030-drops-50680}, doi = {10.4230/LIPIcs.CCC.2015.217}, annote = {Keywords: Arthur-Merlin communication complexity, streaming interactive proofs} }

Document

**Published in:** LIPIcs, Volume 9, 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)

Given a stream of $(x,y)$ points, we consider the problem of finding univariate polynomials that best fit the data. Over finite fields, this problem encompasses the well-studied problem of decoding Reed-Solomon codes while over the reals it corresponds to the well-studied polynomial regression problem.
We present one-pass algorithms for two natural problems: i) find the polynomial of a given degree $k$ that minimizes the error and ii) find the polynomial of smallest degree that interpolates through the points with at most a given error bound. We consider a range of error models including the average error per point, the maximum error, and the number of points that are not fitted exactly. Many of our results apply to both the reals and finite fields. As a consequence we also solve an open question regarding the tolerant testing of codes in the data stream model.

Andrew McGregor, Atri Rudra, and Steve Uurtamo. Polynomial Fitting of Data Streams with Applications to Codeword Testing. In 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011). Leibniz International Proceedings in Informatics (LIPIcs), Volume 9, pp. 428-439, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2011)

Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:LIPIcs.STACS.2011.428, author = {McGregor, Andrew and Rudra, Atri and Uurtamo, Steve}, title = {{Polynomial Fitting of Data Streams with Applications to Codeword Testing}}, booktitle = {28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)}, pages = {428--439}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-25-5}, ISSN = {1868-8969}, year = {2011}, volume = {9}, editor = {Schwentick, Thomas and D\"{u}rr, Christoph}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2011.428}, URN = {urn:nbn:de:0030-drops-30322}, doi = {10.4230/LIPIcs.STACS.2011.428}, annote = {Keywords: Streaming, Polynomial Interpolation, Polynomial Regression} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail