Document

RANDOM

**Published in:** LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)

We consider the problem of finding an approximate solution to 𝓁₁ regression while only observing a small number of labels. Given an n × d unlabeled data matrix X, we must choose a small set of m ≪ n rows to observe the labels of, then output an estimate β̂ whose error on the original problem is within a 1 + ε factor of optimal. We show that sampling from X according to its Lewis weights and outputting the empirical minimizer succeeds with probability 1-δ for m > O(1/(ε²) d log d/(ε δ)). This is analogous to the performance of sampling according to leverage scores for 𝓁₂ regression, but with exponentially better dependence on δ. We also give a corresponding lower bound of Ω(d/(ε²) + (d + 1/(ε²)) log 1/(δ)).

Aditya Parulekar, Advait Parulekar, and Eric Price. L1 Regression with Lewis Weights Subsampling. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 49:1-49:21, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{parulekar_et_al:LIPIcs.APPROX/RANDOM.2021.49, author = {Parulekar, Aditya and Parulekar, Advait and Price, Eric}, title = {{L1 Regression with Lewis Weights Subsampling}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)}, pages = {49:1--49:21}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-207-5}, ISSN = {1868-8969}, year = {2021}, volume = {207}, editor = {Wootters, Mary and Sanit\`{a}, Laura}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.49}, URN = {urn:nbn:de:0030-drops-147422}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2021.49}, annote = {Keywords: Active regression, Lewis weights} }

Document

**Published in:** LIPIcs, Volume 200, 36th Computational Complexity Conference (CCC 2021)

The multiplayer promise set disjointness is one of the most widely used problems from communication complexity in applications. In this problem there are k players with subsets S¹, …, S^k, each drawn from {1, 2, …, n}, and we are promised that either the sets are (1) pairwise disjoint, or (2) there is a unique element j occurring in all the sets, which are otherwise pairwise disjoint. The total communication of solving this problem with constant probability in the blackboard model is Ω(n/k).
We observe for most applications, it instead suffices to look at what we call the "mostly" set disjointness problem, which changes case (2) to say there is a unique element j occurring in at least half of the sets, and the sets are otherwise disjoint. This change gives us a much simpler proof of an Ω(n/k) randomized total communication lower bound, avoiding Hellinger distance and Poincare inequalities. Our proof also gives strong lower bounds for high probability protocols, which are much larger than what is possible for the set disjointness problem. Using this we show several new results for data streams:
1) for 𝓁₂-Heavy Hitters, any O(1)-pass streaming algorithm in the insertion-only model for detecting if an ε-𝓁₂-heavy hitter exists requires min(1/(ε²)log((ε²n)/δ), 1/(ε)n^{1/2}) bits of memory, which is optimal up to a log n factor. For deterministic algorithms and constant ε, this gives an Ω(n^{1/2}) lower bound, improving the prior Ω(log n) lower bound. We also obtain lower bounds for Zipfian distributions.
2) for 𝓁_p-Estimation, p > 2, we show an O(1)-pass Ω(n^{1-2/p} log(1/δ)) bit lower bound for outputting an O(1)- approximation with probability 1-δ, in the insertion-only model. This is optimal, and the best previous lower bound was Ω(n^{1-2/p} + log(1/δ)).
3) for low rank approximation of a sparse matrix in ℝ^{d× n}, if we see the rows of a matrix one at a time in the row-order model, each row having O(1) non-zero entries, any deterministic algorithm requires Ω(√d) memory to output an O(1)-approximate rank-1 approximation. Finally, we consider strict and general turnstile streaming models, and show separations between sketching lower bounds and non-sketching upper bounds for the heavy hitters problem.

Akshay Kamath, Eric Price, and David P. Woodruff. A Simple Proof of a New Set Disjointness with Applications to Data Streams. In 36th Computational Complexity Conference (CCC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 200, pp. 37:1-37:24, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{kamath_et_al:LIPIcs.CCC.2021.37, author = {Kamath, Akshay and Price, Eric and Woodruff, David P.}, title = {{A Simple Proof of a New Set Disjointness with Applications to Data Streams}}, booktitle = {36th Computational Complexity Conference (CCC 2021)}, pages = {37:1--37:24}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-193-1}, ISSN = {1868-8969}, year = {2021}, volume = {200}, editor = {Kabanets, Valentine}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2021.37}, URN = {urn:nbn:de:0030-drops-143119}, doi = {10.4230/LIPIcs.CCC.2021.37}, annote = {Keywords: Streaming algorithms, heavy hitters, communication complexity, information complexity} }

Document

RANDOM

**Published in:** LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)

In this paper, we consider the problem of noiseless non-adaptive group testing under the for-each recovery guarantee, also known as probabilistic group testing. In the case of n items and k defectives, we provide an algorithm attaining high-probability recovery with O(k log n) scaling in both the number of tests and runtime, improving on the best known O(k² log k ⋅ log n) runtime previously available for any algorithm that only uses O(k log n) tests. Our algorithm bears resemblance to Hwang’s adaptive generalized binary splitting algorithm (Hwang, 1972); we recursively work with groups of items of geometrically vanishing sizes, while maintaining a list of "possibly defective" groups and circumventing the need for adaptivity. While the most basic form of our algorithm requires Ω(n) storage, we also provide a low-storage variant based on hashing, with similar recovery guarantees.

Eric Price and Jonathan Scarlett. A Fast Binary Splitting Approach to Non-Adaptive Group Testing. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 13:1-13:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{price_et_al:LIPIcs.APPROX/RANDOM.2020.13, author = {Price, Eric and Scarlett, Jonathan}, title = {{A Fast Binary Splitting Approach to Non-Adaptive Group Testing}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)}, pages = {13:1--13:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-164-1}, ISSN = {1868-8969}, year = {2020}, volume = {176}, editor = {Byrka, Jaros{\l}aw and Meka, Raghu}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.13}, URN = {urn:nbn:de:0030-drops-126165}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2020.13}, annote = {Keywords: Group testing, sparsity, sublinear-time decoding, binary splitting} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)

We consider the problem of locating a signal whose frequencies are "off grid" and clustered in a narrow band. Given noisy sample access to a function g(t) with Fourier spectrum in a narrow range [f_0 - Delta, f_0 + Delta], how accurately is it possible to identify f_0? We present generic conditions on g that allow for efficient, accurate estimates of the frequency. We then show bounds on these conditions for k-Fourier-sparse signals that imply recovery of f_0 to within Delta + O~(k^3) from samples on [-1, 1]. This improves upon the best previous bound of O(Delta + O~(k^5))^{1.5}. We also show that no algorithm can do better than Delta + O~(k^2).
In the process we provide a new O~(k^3) bound on the ratio between the maximum and average value of continuous k-Fourier-sparse signals, which has independent application.

Xue Chen and Eric Price. Estimating the Frequency of a Clustered Signal. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 36:1-36:13, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICALP.2019.36, author = {Chen, Xue and Price, Eric}, title = {{Estimating the Frequency of a Clustered Signal}}, booktitle = {46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)}, pages = {36:1--36:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-109-2}, ISSN = {1868-8969}, year = {2019}, volume = {132}, editor = {Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.36}, URN = {urn:nbn:de:0030-drops-106128}, doi = {10.4230/LIPIcs.ICALP.2019.36}, annote = {Keywords: sublinear algorithms, Fourier transform} }

Document

**Published in:** OASIcs, Volume 69, 2nd Symposium on Simplicity in Algorithms (SOSA 2019)

We present a simple and effective algorithm for the problem of sparse robust linear regression. In this problem, one would like to estimate a sparse vector w^* in R^n from linear measurements corrupted by sparse noise that can arbitrarily change an adversarially chosen eta fraction of measured responses y, as well as introduce bounded norm noise to the responses.
For Gaussian measurements, we show that a simple algorithm based on L1 regression can successfully estimate w^* for any eta < eta_0 ~~ 0.239, and that this threshold is tight for the algorithm. The number of measurements required by the algorithm is O(k log n/k) for k-sparse estimation, which is within constant factors of the number needed without any sparse noise.
Of the three properties we show - the ability to estimate sparse, as well as dense, w^*; the tolerance of a large constant fraction of outliers; and tolerance of adversarial rather than distributional (e.g., Gaussian) dense noise - to the best of our knowledge, no previous polynomial time algorithm was known to achieve more than two.

Sushrut Karmalkar and Eric Price. Compressed Sensing with Adversarial Sparse Noise via L1 Regression. In 2nd Symposium on Simplicity in Algorithms (SOSA 2019). Open Access Series in Informatics (OASIcs), Volume 69, pp. 19:1-19:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{karmalkar_et_al:OASIcs.SOSA.2019.19, author = {Karmalkar, Sushrut and Price, Eric}, title = {{Compressed Sensing with Adversarial Sparse Noise via L1 Regression}}, booktitle = {2nd Symposium on Simplicity in Algorithms (SOSA 2019)}, pages = {19:1--19:19}, series = {Open Access Series in Informatics (OASIcs)}, ISBN = {978-3-95977-099-6}, ISSN = {2190-6807}, year = {2019}, volume = {69}, editor = {Fineman, Jeremy T. and Mitzenmacher, Michael}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SOSA.2019.19}, URN = {urn:nbn:de:0030-drops-100455}, doi = {10.4230/OASIcs.SOSA.2019.19}, annote = {Keywords: Robust Regression, Compressed Sensing} }

Document

**Published in:** LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1-delta, whether the distributions are identical versus epsilon-far in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta via black-box amplification, which multiplies the required number of samples by Theta(log(1/delta)).
We show that black-box amplification is suboptimal for any delta = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is Theta((1/epsilon^2) (sqrt{n log(1/delta)} + log(1/delta))) for any n, epsilon, and delta. For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_{TV} (p, U_n) >= epsilon, we simply threshold d_{TV}({p^}, U_n), where {p^} is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant delta case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of epsilon and delta.
An important contribution of this work lies in the analysis techniques that we introduce in this context. First, we exploit an underlying strong convexity property to bound from below the expectation gap in the completeness and soundness cases. Second, we give a new, fast method for obtaining provably correct empirical estimates of the true worst-case failure probability for a broad class of uniformity testing statistics over all possible input distributions - including all previously studied statistics for this problem. We believe that our novel analysis techniques will be useful for other distribution testing problems as well.

Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Sample-Optimal Identity Testing with High Probability. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 41:1-41:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{diakonikolas_et_al:LIPIcs.ICALP.2018.41, author = {Diakonikolas, Ilias and Gouleakis, Themis and Peebles, John and Price, Eric}, title = {{Sample-Optimal Identity Testing with High Probability}}, booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)}, pages = {41:1--41:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-076-7}, ISSN = {1868-8969}, year = {2018}, volume = {107}, editor = {Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.41}, URN = {urn:nbn:de:0030-drops-90459}, doi = {10.4230/LIPIcs.ICALP.2018.41}, annote = {Keywords: distribution testing, property testing, sample complexity} }

Document

**Published in:** LIPIcs, Volume 81, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)

A hereditary property of a sequence is one that is preserved when restricting to subsequences. We show that there exist hereditary properties of sequences that cannot be tested with sublinear queries, resolving an open question posed by Newman et al. This proof relies crucially on an infinite alphabet, however; for finite alphabets, we observe that any hereditary property can be tested with a constant number of queries.

Cody R. Freitag, Eric Price, and William J. Swartworth. Testing Hereditary Properties of Sequences. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 81, pp. 44:1-44:10, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{freitag_et_al:LIPIcs.APPROX-RANDOM.2017.44, author = {Freitag, Cody R. and Price, Eric and Swartworth, William J.}, title = {{Testing Hereditary Properties of Sequences}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)}, pages = {44:1--44:10}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-044-6}, ISSN = {1868-8969}, year = {2017}, volume = {81}, editor = {Jansen, Klaus and Rolim, Jos\'{e} D. P. and Williamson, David P. and Vempala, Santosh S.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2017.44}, URN = {urn:nbn:de:0030-drops-75938}, doi = {10.4230/LIPIcs.APPROX-RANDOM.2017.44}, annote = {Keywords: Property Testing} }

Document

**Published in:** LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

Sketching has emerged as a powerful technique for speeding up problems in numerical linear algebra, such as regression. In the overconstrained regression problem, one is given an n x d matrix A, with n >> d, as well as an n x 1 vector b, and one wants to find a vector \hat{x} so as to minimize the residual error ||Ax-b||_2. Using the sketch and solve paradigm, one first computes S \cdot A and S \cdot b for a randomly chosen matrix S, then outputs x' = (SA)^{\dagger} Sb so as to minimize || SAx' - Sb||_2.
The sketch-and-solve paradigm gives a bound on ||x'-x^*||_2 when A is well-conditioned. Our main result is that, when S is the subsampled randomized Fourier/Hadamard transform, the error x' - x^* behaves as if it lies in a "random" direction within this bound: for any fixed direction a in R^d, we have with 1 - d^{-c} probability that
(1) \langle a, x'-x^* \rangle \lesssim \frac{ \|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}},
where c, \gamma > 0 are arbitrary constants. This implies ||x'-x^*||_{\infty} is a factor d^{\frac{1}{2}-\gamma} smaller than ||x'-x^*||_2. It also gives a better bound on the generalization of x' to new examples: if rows of A correspond to examples and columns to features, then our result gives a better bound for the error introduced by sketch-and-solve when classifying fresh examples. We show that not all oblivious subspace embeddings S satisfy these properties. In particular, we give counterexamples showing that matrices based on Count-Sketch or leverage score sampling do not satisfy these properties.
We also provide lower bounds, both on how small ||x'-x^*||_2 can be, and for our new guarantee (1), showing that the subsampled randomized Fourier/Hadamard transform is nearly optimal. Our lower bound on ||x'-x^*||_2 shows that there is an O(1/epsilon) separation in the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||x'-x^*||_2 <= epsilon ||Ax^*-b||_2 \cdot ||A^{\dagger}||_2$, compared to the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||Ax'-b||_2 <= (1+epsilon)||Ax^*-b||_2, that is, the former problem requires dimension Omega(d/epsilon^2) while the latter problem can be solved with dimension O(d/epsilon). This explains the reason known upper bounds on the dimensions of these two variants of regression have differed in prior work.

Eric Price, Zhao Song, and David P. Woodruff. Fast Regression with an $ell_infty$ Guarantee. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 59:1-59:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{price_et_al:LIPIcs.ICALP.2017.59, author = {Price, Eric and Song, Zhao and Woodruff, David P.}, title = {{Fast Regression with an \$ell\underlineinfty\$ Guarantee}}, booktitle = {44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)}, pages = {59:1--59:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-041-5}, ISSN = {1868-8969}, year = {2017}, volume = {80}, editor = {Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.59}, URN = {urn:nbn:de:0030-drops-74488}, doi = {10.4230/LIPIcs.ICALP.2017.59}, annote = {Keywords: Linear regression, Count-Sketch, Gaussians, Leverage scores, ell\underlineinfty-guarantee} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail