Fast Regression with an $ell_infty$ Guarantee

Authors Eric Price, Zhao Song, David P. Woodruff



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2017.59.pdf
  • Filesize: 0.56 MB
  • 14 pages

Document Identifiers

Author Details

Eric Price
Zhao Song
David P. Woodruff

Cite As Get BibTex

Eric Price, Zhao Song, and David P. Woodruff. Fast Regression with an $ell_infty$ Guarantee. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 59:1-59:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.ICALP.2017.59

Abstract

Sketching has emerged as a powerful technique for speeding up problems in numerical linear algebra, such as regression. In the overconstrained regression problem, one is given an n x d matrix A, with n >> d, as well as an n x 1 vector b, and one wants to find a vector \hat{x} so as to minimize the residual error ||Ax-b||_2. Using the sketch and solve  paradigm, one first computes S \cdot A and S \cdot b for a randomly chosen matrix S, then outputs x' = (SA)^{\dagger} Sb so as to minimize || SAx' - Sb||_2.

The sketch-and-solve paradigm gives a bound on ||x'-x^*||_2 when A is well-conditioned. Our main result is that, when S is the subsampled randomized Fourier/Hadamard transform, the error x' - x^* behaves as if it lies in a "random" direction within this bound: for any fixed direction a in R^d, we have with 1 - d^{-c} probability that 
(1) \langle a, x'-x^* \rangle \lesssim \frac{ \|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}},
where c, \gamma > 0 are arbitrary constants. This implies ||x'-x^*||_{\infty} is a factor d^{\frac{1}{2}-\gamma} smaller than ||x'-x^*||_2. It also gives a better bound on the generalization of x' to new examples: if rows of A correspond to examples and columns to features, then our result gives a better bound for the error introduced by sketch-and-solve when classifying fresh examples. We show that not all oblivious subspace embeddings S satisfy these properties. In particular, we give counterexamples showing that matrices based on Count-Sketch or leverage score sampling do not satisfy these properties. 

We also provide lower bounds, both on how small ||x'-x^*||_2 can be, and for our new guarantee (1),  showing that the subsampled randomized  Fourier/Hadamard transform is nearly optimal. Our lower bound  on ||x'-x^*||_2 shows that there is an O(1/epsilon) separation in the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||x'-x^*||_2 <= epsilon ||Ax^*-b||_2 \cdot ||A^{\dagger}||_2$, compared to the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||Ax'-b||_2 <= (1+epsilon)||Ax^*-b||_2, that is, the former problem requires dimension Omega(d/epsilon^2) while the latter problem can be solved with dimension O(d/epsilon). This explains the reason known upper bounds on the dimensions of these two variants of regression have differed in prior work.

Subject Classification

Keywords
  • Linear regression
  • Count-Sketch
  • Gaussians
  • Leverage scores
  • ell_infty-guarantee

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Patrick Amestoy, Iain S. Duff, Jean-Yves L'Excellent, Yves Robert, François-Henry Rouet, and Bora Uçar. On computing inverse entries of a sparse matrix in an out-of-core environment. SIAM J. Scientific Computing, 34(4), 2012. Google Scholar
  2. Haim Avron, Petar Maymounkov, and Sivan Toledo. Blendenpik: Supercharging lapack’s least-squares solver. SIAM J. Scientific Computing, 32(3):1217-1236, 2010. Google Scholar
  3. Jean Bourgain, Sjoerd Dirksen, and Jelani Nelson. Toward a unified theory of sparse dimensionality reduction in euclidean space. In STOC, 2015. Google Scholar
  4. Kenneth L. Clarkson and David P. Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 205-214. ACM, 2009. Google Scholar
  5. Kenneth L Clarkson and David P. Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 81-90. ACM, 2013. Google Scholar
  6. Michael B. Cohen. Nearly tight oblivious subspace embeddings by trace inequalities. In SODA, pages 278-287, 2016. Google Scholar
  7. Michael B Cohen, Cameron Musco, and Christopher Musco. Ridge leverage scores for low-rank approximation. SODA, 2017. Google Scholar
  8. Michael B Cohen, Jelani Nelson, and David P. Woodruff. Optimal approximate matrix product in terms of stable rank. In ICALP, 2016. Google Scholar
  9. Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic progressions. J. Symb. Comput., 9(3):251-280, 1990. Google Scholar
  10. Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, and David P. Woodruff. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13:3475-3506, 2012. Google Scholar
  11. Petros Drineas, Michael W Mahoney, S Muthukrishnan, and Tamás Sarlós. Faster least squares approximation. Numerische Mathematik, 117(2):219-249, 2011. Google Scholar
  12. François Le Gall. Powers of tensors and fast matrix multiplication. In International Symposium on Symbolic and Algebraic Computation, ISSAC'14, Kobe, Japan, July 23-25, 2014, pages 296-303, 2014. Google Scholar
  13. Tiefeng Jiang. How many entries of a typical orthogonal matrix can be approximated by independent normals? The Annals of Probability, 34(4):1497-1529, 2006. Google Scholar
  14. Jeff Leek. Prediction: the lasso vs. just using the top 10 predictors. In simplystats, February 23. http://simplystatistics.org/2012/02/23/prediction-the-lasso-vs-just-using-the-top-10/, 2012.
  15. Mu Li, Gary L Miller, and Richard Peng. Iterative row sampling. In IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), pages 127-136, 2013. Google Scholar
  16. Song Li, Shaikh S. Ahmed, Gerhard Klimeck, and Eric Darve. Computing entries of the inverse of a sparse matrix using the FIND algorithm. J. Comput. Physics, 227(22):9408-9427, 2008. Google Scholar
  17. Yichao Lu, Paramveer Dhillon, Dean Foster, and Lyle Ungar. Faster ridge regression via the subsampled randomized hadamard transform. In NIPS, 2013. Google Scholar
  18. Xiangrui Meng and Michael W Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 91-100. ACM, 2013. Google Scholar
  19. Jelani Nelson and Huy L Nguyên. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 117-126. IEEE, 2013. Google Scholar
  20. Jelani Nelson and Huy L. Nguyên. Lower bounds for oblivious subspace embeddings. In ICALP, pages 883-894, 2014. Google Scholar
  21. Dimitris Papailiopoulos, Anastasios Kyrillidis, and Christos Boutsidis. Provable deterministic leverage score sampling. In KDD, pages 997-1006. ACM, 2014. Google Scholar
  22. Ryan Rifkin, Gene Yeo, and Tomaso Poggio. Regularized least-squares classification. Nato Science Series Sub Series III Computer and Systems Sciences, 190:131-154, 2003. Google Scholar
  23. Tamás Sarlós. Improved approximation algorithms for large matrices via random projections. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on, pages 143-152. IEEE, 2006. Google Scholar
  24. Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027, 2010. Google Scholar
  25. Virginia Vassilevska Williams. Multiplying matrices faster than Coppersmith-Winograd. In Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19-22, 2012, pages 887-898, 2012. Google Scholar
  26. David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1-2):1-157, 2014. Google Scholar
  27. Peng Zhang and Jing Peng. Svm vs regularized least squares classification. In ICPR (1), pages 176-179, 2004. Google Scholar
  28. Anastasios Zouzias and Nikolaos M. Freris. Randomized extended kaczmarz for solving least squares. SIAM J. Matrix Analysis Applications, 34(2):773-793, 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail