When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICALP.2017.59
URN: urn:nbn:de:0030-drops-74488
URL: http://drops.dagstuhl.de/opus/volltexte/2017/7448/
 Go to the corresponding LIPIcs Volume Portal

### Fast Regression with an $ell_infty$ Guarantee

 pdf-format:

Sketching has emerged as a powerful technique for speeding up problems in numerical linear algebra, such as regression. In the overconstrained regression problem, one is given an n x d matrix A, with n >> d, as well as an n x 1 vector b, and one wants to find a vector \hat{x} so as to minimize the residual error ||Ax-b||_2. Using the sketch and solve paradigm, one first computes S \cdot A and S \cdot b for a randomly chosen matrix S, then outputs x' = (SA)^{\dagger} Sb so as to minimize || SAx' - Sb||_2. The sketch-and-solve paradigm gives a bound on ||x'-x^*||_2 when A is well-conditioned. Our main result is that, when S is the subsampled randomized Fourier/Hadamard transform, the error x' - x^* behaves as if it lies in a "random" direction within this bound: for any fixed direction a in R^d, we have with 1 - d^{-c} probability that (1) \langle a, x'-x^* \rangle \lesssim \frac{ \|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}}, where c, \gamma > 0 are arbitrary constants. This implies ||x'-x^*||_{\infty} is a factor d^{\frac{1}{2}-\gamma} smaller than ||x'-x^*||_2. It also gives a better bound on the generalization of x' to new examples: if rows of A correspond to examples and columns to features, then our result gives a better bound for the error introduced by sketch-and-solve when classifying fresh examples. We show that not all oblivious subspace embeddings S satisfy these properties. In particular, we give counterexamples showing that matrices based on Count-Sketch or leverage score sampling do not satisfy these properties. We also provide lower bounds, both on how small ||x'-x^*||_2 can be, and for our new guarantee (1), showing that the subsampled randomized Fourier/Hadamard transform is nearly optimal. Our lower bound on ||x'-x^*||_2 shows that there is an O(1/epsilon) separation in the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||x'-x^*||_2 <= epsilon ||Ax^*-b||_2 \cdot ||A^{\dagger}||_2$, compared to the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||Ax'-b||_2 <= (1+epsilon)||Ax^*-b||_2, that is, the former problem requires dimension Omega(d/epsilon^2) while the latter problem can be solved with dimension O(d/epsilon). This explains the reason known upper bounds on the dimensions of these two variants of regression have differed in prior work. ### BibTeX - Entry @InProceedings{price_et_al:LIPIcs:2017:7448, author = {Eric Price and Zhao Song and David P. Woodruff}, title = {{Fast Regression with an$ell_infty\$ Guarantee}},
booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
pages =	{59:1--59:14},
series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN =	{978-3-95977-041-5},
ISSN =	{1868-8969},
year =	{2017},
volume =	{80},
editor =	{Ioannis Chatzigiannakis and Piotr Indyk and Fabian Kuhn and Anca Muscholl},
publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},