Document

**Published in:** LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

We revisit the problem of estimating the profile (also known as the rarity) in the data stream model. Given a sequence of m elements from a universe of size n, its profile is a vector ϕ whose i-th entry ϕ_i represents the number of distinct elements that appear in the stream exactly i times. A classic paper by Datar and Muthukrishan from 2002 gave an algorithm which estimates any entry ϕ_i up to an additive error of ± ε D using O(1/ε² (log n + log m)) bits of space, where D is the number of distinct elements in the stream.
In this paper, we considerably improve on this result by designing an algorithm which simultaneously estimates many coordinates of the profile vector ϕ up to small overall error. We give an algorithm which, with constant probability, produces an estimated profile ϕˆ with the following guarantees in terms of space and estimation error:
b) For any constant τ, with O(1 / ε² + log n) bits of space, ∑_{i = 1}^τ |ϕ_i - ϕˆ_i| ≤ ε D.
c) With O(1/ ε²log (1/ε) + log n + log log m) bits of space, ∑_{i = 1}^m |ϕ_i - ϕˆ_i| ≤ ε m. In addition to bounding the error across multiple coordinates, our space bounds separate the terms that depend on 1/ε and those that depend on n and m. We prove matching lower bounds on space in both regimes.
Application of our profile estimation algorithm gives estimates within error ± ε D of several symmetric functions of frequencies in O(1/ε² + log n) bits. This generalizes space-optimal algorithms for the distinct elements problems to other problems including estimating the Huber and Tukey losses as well as frequency cap statistics.

Justin Y. Chen, Piotr Indyk, and David P. Woodruff. Space-Optimal Profile Estimation in Data Streams with Applications to Symmetric Functions. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 32:1-32:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ITCS.2024.32, author = {Chen, Justin Y. and Indyk, Piotr and Woodruff, David P.}, title = {{Space-Optimal Profile Estimation in Data Streams with Applications to Symmetric Functions}}, booktitle = {15th Innovations in Theoretical Computer Science Conference (ITCS 2024)}, pages = {32:1--32:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-309-6}, ISSN = {1868-8969}, year = {2024}, volume = {287}, editor = {Guruswami, Venkatesan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.32}, URN = {urn:nbn:de:0030-drops-195605}, doi = {10.4230/LIPIcs.ITCS.2024.32}, annote = {Keywords: Streaming and Sketching Algorithms, Sublinear Algorithms} }

Document

**Published in:** LIPIcs, Volume 215, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)

A* is a classic and popular method for graphs search and path finding. It assumes the existence of a heuristic function h(u,t) that estimates the shortest distance from any input node u to the destination t. Traditionally, heuristics have been handcrafted by domain experts. However, over the last few years, there has been a growing interest in learning heuristic functions. Such learned heuristics estimate the distance between given nodes based on "features" of those nodes.
In this paper we formalize and initiate the study of such feature-based heuristics. In particular, we consider heuristics induced by norm embeddings and distance labeling schemes, and provide lower bounds for the tradeoffs between the number of dimensions or bits used to represent each graph node, and the running time of the A* algorithm. We also show that, under natural assumptions, our lower bounds are almost optimal.

Talya Eden, Piotr Indyk, and Haike Xu. Embeddings and Labeling Schemes for A*. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 62:1-62:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{eden_et_al:LIPIcs.ITCS.2022.62, author = {Eden, Talya and Indyk, Piotr and Xu, Haike}, title = {{Embeddings and Labeling Schemes for A*}}, booktitle = {13th Innovations in Theoretical Computer Science Conference (ITCS 2022)}, pages = {62:1--62:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-217-4}, ISSN = {1868-8969}, year = {2022}, volume = {215}, editor = {Braverman, Mark}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2022.62}, URN = {urn:nbn:de:0030-drops-156585}, doi = {10.4230/LIPIcs.ITCS.2022.62}, annote = {Keywords: A* algorithm, path finding, graph search} }

Document

**Published in:** LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

In the Sparse Linear Regression (SLR) problem, given a d x n matrix M and a d-dimensional query q, the goal is to compute a k-sparse n-dimensional vector tau such that the error ||M tau - q|| is minimized. This problem is equivalent to the following geometric problem: given a set P of n points and a query point q in d dimensions, find the closest k-dimensional subspace to q, that is spanned by a subset of k points in P. In this paper, we present data-structures/algorithms and conditional lower bounds for several variants of this problem (such as finding the closest induced k dimensional flat/simplex instead of a subspace).
In particular, we present approximation algorithms for the online variants of the above problems with query time O~(n^{k-1}), which are of interest in the "low sparsity regime" where k is small, e.g., 2 or 3. For k=d, this matches, up to polylogarithmic factors, the lower bound that relies on the affinely degenerate conjecture (i.e., deciding if n points in R^d contains d+1 points contained in a hyperplane takes Omega(n^d) time). Moreover, our algorithms involve formulating and solving several geometric subproblems, which we believe to be of independent interest.

Sariel Har-Peled, Piotr Indyk, and Sepideh Mahabadi. Approximate Sparse Linear Regression. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 77:1-77:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{harpeled_et_al:LIPIcs.ICALP.2018.77, author = {Har-Peled, Sariel and Indyk, Piotr and Mahabadi, Sepideh}, title = {{Approximate Sparse Linear Regression}}, booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)}, pages = {77:1--77:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-076-7}, ISSN = {1868-8969}, year = {2018}, volume = {107}, editor = {Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.77}, URN = {urn:nbn:de:0030-drops-90816}, doi = {10.4230/LIPIcs.ICALP.2018.77}, annote = {Keywords: Sparse Linear Regression, Approximate Nearest Neighbor, Sparse Recovery, Nearest Induced Flat, Nearest Subspace Search} }

Document

**Published in:** LIPIcs, Volume 81, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)

We study the Fractional Set Cover problem in the streaming model. That is, we consider the relaxation of the set cover problem over a universe of n elements and a collection of m sets, where each set can be picked fractionally, with a value in [0,1]. We present a randomized (1+a)-approximation algorithm that makes p passes over the data, and uses O(polylog(m,n,1/a) (mn^(O(1/(pa)))+n)) memory space. The algorithm works in both the set arrival and the edge arrival models. To the best of our knowledge, this is the first streaming result for the fractional set cover problem. We obtain our results by employing the multiplicative weights update framework in the streaming settings.

Piotr Indyk, Sepideh Mahabadi, Ronitt Rubinfeld, Jonathan Ullman, Ali Vakilian, and Anak Yodpinyanee. Fractional Set Cover in the Streaming Model. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 81, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{indyk_et_al:LIPIcs.APPROX-RANDOM.2017.12, author = {Indyk, Piotr and Mahabadi, Sepideh and Rubinfeld, Ronitt and Ullman, Jonathan and Vakilian, Ali and Yodpinyanee, Anak}, title = {{Fractional Set Cover in the Streaming Model}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)}, pages = {12:1--12:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-044-6}, ISSN = {1868-8969}, year = {2017}, volume = {81}, editor = {Jansen, Klaus and Rolim, Jos\'{e} D. P. and Williamson, David P. and Vempala, Santosh S.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2017.12}, URN = {urn:nbn:de:0030-drops-75613}, doi = {10.4230/LIPIcs.APPROX-RANDOM.2017.12}, annote = {Keywords: Streaming Algorithms, Fractional Set Cover, LP relaxation, Multiplicative Weight Update} }

Document

Complete Volume

**Published in:** LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

LIPIcs, Volume 80, ICALP'17, Complete Volume

44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@Proceedings{chatzigiannakis_et_al:LIPIcs.ICALP.2017, title = {{LIPIcs, Volume 80, ICALP'17, Complete Volume}}, booktitle = {44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-041-5}, ISSN = {1868-8969}, year = {2017}, volume = {80}, editor = {Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017}, URN = {urn:nbn:de:0030-drops-75107}, doi = {10.4230/LIPIcs.ICALP.2017}, annote = {Keywords: Theory of Computation} }

Document

Front Matter

**Published in:** LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

Front Matter, Table of Contents, Preface, Organization, List of Authors

44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 0:i-0:xlii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{chatzigiannakis_et_al:LIPIcs.ICALP.2017.0, author = {Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca}, title = {{Front Matter, Table of Contents, Preface, Organization, List of Authors}}, booktitle = {44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)}, pages = {0:i--0:xlii}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-041-5}, ISSN = {1868-8969}, year = {2017}, volume = {80}, editor = {Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.0}, URN = {urn:nbn:de:0030-drops-73663}, doi = {10.4230/LIPIcs.ICALP.2017.0}, annote = {Keywords: Front Matter, Table of Contents, Preface, Organization, List of Authors} }

Document

**Published in:** LIPIcs, Volume 51, 32nd International Symposium on Computational Geometry (SoCG 2016)

Motivated by applications in computer vision and databases, we introduce and study the Simultaneous Nearest Neighbor Search (SNN) problem. Given a set of data points, the goal of SNN is to design a data structure that, given a collection of queries, finds a collection of close points that are compatible with each other. Formally, we are given k query points Q=q_1,...,q_k, and a compatibility graph G with vertices in Q, and the goal is to return data points p_1,...,p_k that minimize (i) the weighted sum of the distances from q_i to p_i and (ii) the weighted sum, over all edges (i,j) in the compatibility graph G, of the distances between p_i and p_j. The problem has several applications in computer vision and databases, where one wants to return a set of *consistent* answers to multiple related queries. Furthermore, it generalizes several well-studied computational problems, including Nearest Neighbor Search, Aggregate Nearest Neighbor Search and the 0-extension problem.
In this paper we propose and analyze the following general two-step method for designing efficient data structures for SNN. In the first step, for each query point q_i we find its (approximate) nearest neighbor point p'_i; this can be done efficiently using existing approximate nearest neighbor structures. In the second step, we solve an off-line optimization problem over sets q_1,...,q_k and p'_1,...,p'_k; this can be done efficiently given that k is much smaller than n. Even though p'_1,...,p'_k might not constitute the optimal answers to queries q_1,...,q_k, we show that, for the unweighted case, the resulting algorithm satisfies a O(log k/log log k)-approximation guarantee. Furthermore, we show that the approximation factor can be in fact reduced to a constant for compatibility graphs frequently occurring in practice, e.g., 2D grids, 3D grids or planar graphs.
Finally, we validate our theoretical results by preliminary experiments. In particular, we show that the empirical approximation factor provided by the above approach is very close to 1.

Piotr Indyk, Robert Kleinberg, Sepideh Mahabadi, and Yang Yuan. Simultaneous Nearest Neighbor Search. In 32nd International Symposium on Computational Geometry (SoCG 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 51, pp. 44:1-44:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{indyk_et_al:LIPIcs.SoCG.2016.44, author = {Indyk, Piotr and Kleinberg, Robert and Mahabadi, Sepideh and Yuan, Yang}, title = {{Simultaneous Nearest Neighbor Search}}, booktitle = {32nd International Symposium on Computational Geometry (SoCG 2016)}, pages = {44:1--44:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-009-5}, ISSN = {1868-8969}, year = {2016}, volume = {51}, editor = {Fekete, S\'{a}ndor and Lubiw, Anna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2016.44}, URN = {urn:nbn:de:0030-drops-59360}, doi = {10.4230/LIPIcs.SoCG.2016.44}, annote = {Keywords: Approximate Nearest Neighbor, Metric Labeling, 0-extension, Simultaneous Nearest Neighbor, Group Nearest Neighbor} }