DROPS

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.75

Sharp Noisy Binary Search with Monotonic Probabilities

Authors: Lucas Gretta and Eric Price

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We revisit the noisy binary search model of [Karp and Kleinberg, 2007], in which we have n coins with unknown probabilities p_i that we can flip. The coins are sorted by increasing p_i, and we would like to find where the probability crosses (to within ε) of a target value τ. This generalized the fixed-noise model of [Burnashev and Zigangirov, 1974], in which p_i = 1/2 ± ε, to a setting where coins near the target may be indistinguishable from it. It was shown in [Karp and Kleinberg, 2007] that Θ(1/ε² log n) samples are necessary and sufficient for this task. We produce a practical algorithm by solving two theoretical challenges: high-probability behavior and sharp constants. We give an algorithm that succeeds with probability 1-δ from 1/C_{τ, ε} ⋅ (log₂ n + O(log^{2/3} n log^{1/3} 1/(δ) + log 1/(δ))) samples, where C_{τ, ε} is the optimal such constant achievable. For δ > n^{-o(1)} this is within 1 + o(1) of optimal, and for δ ≪ 1 it is the first bound within constant factors of optimal.

Cite as

Lucas Gretta and Eric Price. Sharp Noisy Binary Search with Monotonic Probabilities. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 75:1-75:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{gretta_et_al:LIPIcs.ICALP.2024.75,
  author =	{Gretta, Lucas and Price, Eric},
  title =	{{Sharp Noisy Binary Search with Monotonic Probabilities}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{75:1--75:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.75},
  URN =		{urn:nbn:de:0030-drops-202188},
  doi =		{10.4230/LIPIcs.ICALP.2024.75},
  annote =	{Keywords: fine-grained algorithms, randomized/probabilistic methods, sublinear/streaming algorithms, noisy binary search}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2021.49

L1 Regression with Lewis Weights Subsampling

Authors: Aditya Parulekar, Advait Parulekar, and Eric Price

Published in: LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)

Abstract

We consider the problem of finding an approximate solution to 𝓁₁ regression while only observing a small number of labels. Given an n × d unlabeled data matrix X, we must choose a small set of m ≪ n rows to observe the labels of, then output an estimate β̂ whose error on the original problem is within a 1 + ε factor of optimal. We show that sampling from X according to its Lewis weights and outputting the empirical minimizer succeeds with probability 1-δ for m > O(1/(ε²) d log d/(ε δ)). This is analogous to the performance of sampling according to leverage scores for 𝓁₂ regression, but with exponentially better dependence on δ. We also give a corresponding lower bound of Ω(d/(ε²) + (d + 1/(ε²)) log 1/(δ)).

Cite as

Aditya Parulekar, Advait Parulekar, and Eric Price. L1 Regression with Lewis Weights Subsampling. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 49:1-49:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{parulekar_et_al:LIPIcs.APPROX/RANDOM.2021.49,
  author =	{Parulekar, Aditya and Parulekar, Advait and Price, Eric},
  title =	{{L1 Regression with Lewis Weights Subsampling}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)},
  pages =	{49:1--49:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-207-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{207},
  editor =	{Wootters, Mary and Sanit\`{a}, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.49},
  URN =		{urn:nbn:de:0030-drops-147422},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2021.49},
  annote =	{Keywords: Active regression, Lewis weights}
}

Document

DOI: 10.4230/LIPIcs.CCC.2021.37

A Simple Proof of a New Set Disjointness with Applications to Data Streams

Authors: Akshay Kamath, Eric Price, and David P. Woodruff

Published in: LIPIcs, Volume 200, 36th Computational Complexity Conference (CCC 2021)

Abstract

The multiplayer promise set disjointness is one of the most widely used problems from communication complexity in applications. In this problem there are k players with subsets S¹, …, S^k, each drawn from {1, 2, …, n}, and we are promised that either the sets are (1) pairwise disjoint, or (2) there is a unique element j occurring in all the sets, which are otherwise pairwise disjoint. The total communication of solving this problem with constant probability in the blackboard model is Ω(n/k). We observe for most applications, it instead suffices to look at what we call the "mostly" set disjointness problem, which changes case (2) to say there is a unique element j occurring in at least half of the sets, and the sets are otherwise disjoint. This change gives us a much simpler proof of an Ω(n/k) randomized total communication lower bound, avoiding Hellinger distance and Poincare inequalities. Our proof also gives strong lower bounds for high probability protocols, which are much larger than what is possible for the set disjointness problem. Using this we show several new results for data streams: 1) for 𝓁₂-Heavy Hitters, any O(1)-pass streaming algorithm in the insertion-only model for detecting if an ε-𝓁₂-heavy hitter exists requires min(1/(ε²)log((ε²n)/δ), 1/(ε)n^{1/2}) bits of memory, which is optimal up to a log n factor. For deterministic algorithms and constant ε, this gives an Ω(n^{1/2}) lower bound, improving the prior Ω(log n) lower bound. We also obtain lower bounds for Zipfian distributions. 2) for 𝓁_p-Estimation, p > 2, we show an O(1)-pass Ω(n^{1-2/p} log(1/δ)) bit lower bound for outputting an O(1)- approximation with probability 1-δ, in the insertion-only model. This is optimal, and the best previous lower bound was Ω(n^{1-2/p} + log(1/δ)). 3) for low rank approximation of a sparse matrix in ℝ^{d× n}, if we see the rows of a matrix one at a time in the row-order model, each row having O(1) non-zero entries, any deterministic algorithm requires Ω(√d) memory to output an O(1)-approximate rank-1 approximation. Finally, we consider strict and general turnstile streaming models, and show separations between sketching lower bounds and non-sketching upper bounds for the heavy hitters problem.

Cite as

Akshay Kamath, Eric Price, and David P. Woodruff. A Simple Proof of a New Set Disjointness with Applications to Data Streams. In 36th Computational Complexity Conference (CCC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 200, pp. 37:1-37:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{kamath_et_al:LIPIcs.CCC.2021.37,
  author =	{Kamath, Akshay and Price, Eric and Woodruff, David P.},
  title =	{{A Simple Proof of a New Set Disjointness with Applications to Data Streams}},
  booktitle =	{36th Computational Complexity Conference (CCC 2021)},
  pages =	{37:1--37:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-193-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{200},
  editor =	{Kabanets, Valentine},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2021.37},
  URN =		{urn:nbn:de:0030-drops-143119},
  doi =		{10.4230/LIPIcs.CCC.2021.37},
  annote =	{Keywords: Streaming algorithms, heavy hitters, communication complexity, information complexity}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2020.13

A Fast Binary Splitting Approach to Non-Adaptive Group Testing

Authors: Eric Price and Jonathan Scarlett

Published in: LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)

Abstract

In this paper, we consider the problem of noiseless non-adaptive group testing under the for-each recovery guarantee, also known as probabilistic group testing. In the case of n items and k defectives, we provide an algorithm attaining high-probability recovery with O(k log n) scaling in both the number of tests and runtime, improving on the best known O(k² log k ⋅ log n) runtime previously available for any algorithm that only uses O(k log n) tests. Our algorithm bears resemblance to Hwang’s adaptive generalized binary splitting algorithm (Hwang, 1972); we recursively work with groups of items of geometrically vanishing sizes, while maintaining a list of "possibly defective" groups and circumventing the need for adaptivity. While the most basic form of our algorithm requires Ω(n) storage, we also provide a low-storage variant based on hashing, with similar recovery guarantees.

Cite as

Eric Price and Jonathan Scarlett. A Fast Binary Splitting Approach to Non-Adaptive Group Testing. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 13:1-13:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{price_et_al:LIPIcs.APPROX/RANDOM.2020.13,
  author =	{Price, Eric and Scarlett, Jonathan},
  title =	{{A Fast Binary Splitting Approach to Non-Adaptive Group Testing}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)},
  pages =	{13:1--13:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-164-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{176},
  editor =	{Byrka, Jaros{\l}aw and Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.13},
  URN =		{urn:nbn:de:0030-drops-126165},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2020.13},
  annote =	{Keywords: Group testing, sparsity, sublinear-time decoding, binary splitting}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2019.36

Estimating the Frequency of a Clustered Signal

Authors: Xue Chen and Eric Price

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)

Abstract

We consider the problem of locating a signal whose frequencies are "off grid" and clustered in a narrow band. Given noisy sample access to a function g(t) with Fourier spectrum in a narrow range [f_0 - Delta, f_0 + Delta], how accurately is it possible to identify f_0? We present generic conditions on g that allow for efficient, accurate estimates of the frequency. We then show bounds on these conditions for k-Fourier-sparse signals that imply recovery of f_0 to within Delta + O~(k^3) from samples on [-1, 1]. This improves upon the best previous bound of O(Delta + O~(k^5))^{1.5}. We also show that no algorithm can do better than Delta + O~(k^2). In the process we provide a new O~(k^3) bound on the ratio between the maximum and average value of continuous k-Fourier-sparse signals, which has independent application.

Cite as

Xue Chen and Eric Price. Estimating the Frequency of a Clustered Signal. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 36:1-36:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICALP.2019.36,
  author =	{Chen, Xue and Price, Eric},
  title =	{{Estimating the Frequency of a Clustered Signal}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{36:1--36:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.36},
  URN =		{urn:nbn:de:0030-drops-106128},
  doi =		{10.4230/LIPIcs.ICALP.2019.36},
  annote =	{Keywords: sublinear algorithms, Fourier transform}
}

Document

DOI: 10.4230/OASIcs.SOSA.2019.19

Compressed Sensing with Adversarial Sparse Noise via L1 Regression

Authors: Sushrut Karmalkar and Eric Price

Published in: OASIcs, Volume 69, 2nd Symposium on Simplicity in Algorithms (SOSA 2019)

Abstract

We present a simple and effective algorithm for the problem of sparse robust linear regression. In this problem, one would like to estimate a sparse vector w^* in R^n from linear measurements corrupted by sparse noise that can arbitrarily change an adversarially chosen eta fraction of measured responses y, as well as introduce bounded norm noise to the responses. For Gaussian measurements, we show that a simple algorithm based on L1 regression can successfully estimate w^* for any eta < eta_0 ~~ 0.239, and that this threshold is tight for the algorithm. The number of measurements required by the algorithm is O(k log n/k) for k-sparse estimation, which is within constant factors of the number needed without any sparse noise. Of the three properties we show - the ability to estimate sparse, as well as dense, w^*; the tolerance of a large constant fraction of outliers; and tolerance of adversarial rather than distributional (e.g., Gaussian) dense noise - to the best of our knowledge, no previous polynomial time algorithm was known to achieve more than two.

Cite as

Sushrut Karmalkar and Eric Price. Compressed Sensing with Adversarial Sparse Noise via L1 Regression. In 2nd Symposium on Simplicity in Algorithms (SOSA 2019). Open Access Series in Informatics (OASIcs), Volume 69, pp. 19:1-19:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{karmalkar_et_al:OASIcs.SOSA.2019.19,
  author =	{Karmalkar, Sushrut and Price, Eric},
  title =	{{Compressed Sensing with Adversarial Sparse Noise via L1 Regression}},
  booktitle =	{2nd Symposium on Simplicity in Algorithms (SOSA 2019)},
  pages =	{19:1--19:19},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-099-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{69},
  editor =	{Fineman, Jeremy T. and Mitzenmacher, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SOSA.2019.19},
  URN =		{urn:nbn:de:0030-drops-100455},
  doi =		{10.4230/OASIcs.SOSA.2019.19},
  annote =	{Keywords: Robust Regression, Compressed Sensing}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2018.41

Sample-Optimal Identity Testing with High Probability

Authors: Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1-delta, whether the distributions are identical versus epsilon-far in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta via black-box amplification, which multiplies the required number of samples by Theta(log(1/delta)). We show that black-box amplification is suboptimal for any delta = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is Theta((1/epsilon^2) (sqrt{n log(1/delta)} + log(1/delta))) for any n, epsilon, and delta. For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_{TV} (p, U_n) >= epsilon, we simply threshold d_{TV}({p^}, U_n), where {p^} is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant delta case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of epsilon and delta. An important contribution of this work lies in the analysis techniques that we introduce in this context. First, we exploit an underlying strong convexity property to bound from below the expectation gap in the completeness and soundness cases. Second, we give a new, fast method for obtaining provably correct empirical estimates of the true worst-case failure probability for a broad class of uniformity testing statistics over all possible input distributions - including all previously studied statistics for this problem. We believe that our novel analysis techniques will be useful for other distribution testing problems as well.

Cite as

Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Sample-Optimal Identity Testing with High Probability. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 41:1-41:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{diakonikolas_et_al:LIPIcs.ICALP.2018.41,
  author =	{Diakonikolas, Ilias and Gouleakis, Themis and Peebles, John and Price, Eric},
  title =	{{Sample-Optimal Identity Testing with High Probability}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{41:1--41:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.41},
  URN =		{urn:nbn:de:0030-drops-90459},
  doi =		{10.4230/LIPIcs.ICALP.2018.41},
  annote =	{Keywords: distribution testing, property testing, sample complexity}
}

Document

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2017.44

Testing Hereditary Properties of Sequences

Authors: Cody R. Freitag, Eric Price, and William J. Swartworth

Published in: LIPIcs, Volume 81, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)

Abstract

A hereditary property of a sequence is one that is preserved when restricting to subsequences. We show that there exist hereditary properties of sequences that cannot be tested with sublinear queries, resolving an open question posed by Newman et al. This proof relies crucially on an infinite alphabet, however; for finite alphabets, we observe that any hereditary property can be tested with a constant number of queries.

Cite as

Cody R. Freitag, Eric Price, and William J. Swartworth. Testing Hereditary Properties of Sequences. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 81, pp. 44:1-44:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{freitag_et_al:LIPIcs.APPROX-RANDOM.2017.44,
  author =	{Freitag, Cody R. and Price, Eric and Swartworth, William J.},
  title =	{{Testing Hereditary Properties of Sequences}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)},
  pages =	{44:1--44:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-044-6},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{81},
  editor =	{Jansen, Klaus and Rolim, Jos\'{e} D. P. and Williamson, David P. and Vempala, Santosh S.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2017.44},
  URN =		{urn:nbn:de:0030-drops-75938},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2017.44},
  annote =	{Keywords: Property Testing}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2017.59

Fast Regression with an $ell_infty$ Guarantee

Authors: Eric Price, Zhao Song, and David P. Woodruff

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

Abstract

Sketching has emerged as a powerful technique for speeding up problems in numerical linear algebra, such as regression. In the overconstrained regression problem, one is given an n x d matrix A, with n >> d, as well as an n x 1 vector b, and one wants to find a vector \hat{x} so as to minimize the residual error ||Ax-b||_2. Using the sketch and solve paradigm, one first computes S \cdot A and S \cdot b for a randomly chosen matrix S, then outputs x' = (SA)^{\dagger} Sb so as to minimize || SAx' - Sb||_2. The sketch-and-solve paradigm gives a bound on ||x'-x^*||_2 when A is well-conditioned. Our main result is that, when S is the subsampled randomized Fourier/Hadamard transform, the error x' - x^* behaves as if it lies in a "random" direction within this bound: for any fixed direction a in R^d, we have with 1 - d^{-c} probability that (1) \langle a, x'-x^* \rangle \lesssim \frac{ \|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}}, where c, \gamma > 0 are arbitrary constants. This implies ||x'-x^*||_{\infty} is a factor d^{\frac{1}{2}-\gamma} smaller than ||x'-x^*||_2. It also gives a better bound on the generalization of x' to new examples: if rows of A correspond to examples and columns to features, then our result gives a better bound for the error introduced by sketch-and-solve when classifying fresh examples. We show that not all oblivious subspace embeddings S satisfy these properties. In particular, we give counterexamples showing that matrices based on Count-Sketch or leverage score sampling do not satisfy these properties. We also provide lower bounds, both on how small ||x'-x^*||_2 can be, and for our new guarantee (1), showing that the subsampled randomized Fourier/Hadamard transform is nearly optimal. Our lower bound on ||x'-x^*||_2 shows that there is an O(1/epsilon) separation in the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||x'-x^*||_2 <= epsilon ||Ax^*-b||_2 \cdot ||A^{\dagger}||_2$, compared to the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||Ax'-b||_2 <= (1+epsilon)||Ax^*-b||_2, that is, the former problem requires dimension Omega(d/epsilon^2) while the latter problem can be solved with dimension O(d/epsilon). This explains the reason known upper bounds on the dimensions of these two variants of regression have differed in prior work.

Cite as

Eric Price, Zhao Song, and David P. Woodruff. Fast Regression with an $ell_infty$ Guarantee. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 59:1-59:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{price_et_al:LIPIcs.ICALP.2017.59,
  author =	{Price, Eric and Song, Zhao and Woodruff, David P.},
  title =	{{Fast Regression with an \$ell\underlineinfty\$ Guarantee}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{59:1--59:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.59},
  URN =		{urn:nbn:de:0030-drops-74488},
  doi =		{10.4230/LIPIcs.ICALP.2017.59},
  annote =	{Keywords: Linear regression, Count-Sketch, Gaussians, Leverage scores, ell\underlineinfty-guarantee}
}

Search Results

Documents authored by Price, Eric

Sharp Noisy Binary Search with Monotonic Probabilities

Abstract

Cite as

L1 Regression with Lewis Weights Subsampling

Abstract

Cite as

A Simple Proof of a New Set Disjointness with Applications to Data Streams

Abstract

Cite as

A Fast Binary Splitting Approach to Non-Adaptive Group Testing

Abstract

Cite as

Estimating the Frequency of a Clustered Signal

Abstract

Cite as

Compressed Sensing with Adversarial Sparse Noise via L1 Regression

Abstract

Cite as

Sample-Optimal Identity Testing with High Probability

Abstract

Cite as

Testing Hereditary Properties of Sequences

Abstract

Cite as

Fast Regression with an $ell_infty$ Guarantee

Abstract

Cite as