Sharper Bounds for Regularized Data Fitting

Authors Haim Avron, Kenneth L. Clarkson, David P. Woodruff

Thumbnail PDF


  • Filesize: 0.55 MB
  • 22 pages

Document Identifiers

Author Details

Haim Avron
Kenneth L. Clarkson
David P. Woodruff

Cite AsGet BibTex

Haim Avron, Kenneth L. Clarkson, and David P. Woodruff. Sharper Bounds for Regularized Data Fitting. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 81, pp. 27:1-27:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


We study matrix sketching methods for regularized variants of linear regression, low rank approximation, and canonical correlation analysis. Our main focus is on sketching techniques which preserve the objective function value for regularized problems, which is an area that has remained largely unexplored. We study regularization both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization; for the latter, as applied to each of these problems, we show algorithmic resource bounds in which the statistical dimension appears in places where in previous bounds the rank would appear. The statistical dimension is always smaller than the rank, and decreases as the amount of regularization increases. In particular we show this for the ridge low-rank approximation problem as well as regularized low-rank approximation problems in a much more general setting, where the regularizing function satisfies some very general conditions (chiefly, invariance under orthogonal transformations).
  • Matrices
  • Regression
  • Low-rank approximation
  • Regularization
  • Canonical Correlation Analysis


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Nir Ailon and Bernard Chazelle. Approximate nearest neighbors and the fast johnson-lindenstrauss transform. In ACM Symposium on Theory of Computing (STOC), 2006. Google Scholar
  2. Alexandr Andoni and Huy L. Nguyen. Eigenvalues of a matrix in the streaming model. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1729-1737. Society for Industrial and Applied Mathematics, 2013. Google Scholar
  3. Haim Avron, Christos Boutsidis, Sivan Toledo, and Anastasios Zouzias. Efficient dimensionality reduction for canonical correlation analysis. SIAM Journal on Scientific Computing, 36(5):S111-S131, 2014. URL:
  4. Haim Avron, Kenneth L. Clarkson, and David P. Woodruff. Faster kernel ridge regression using sketching and preconditioning. CoRR, abs/1611.03220, 2016. URL:
  5. A. Björck and G. H. Golub. Numerical methods for computing angles between linear subspaces. Mathematics of Computation, 27(123):579-594, 1973. Google Scholar
  6. Jean Bourgain, Sjoerd Dirksen, and Jelani Nelson. Toward a unified theory of sparse dimensionality reduction in euclidean space. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 499-508, 2015. Google Scholar
  7. C. Boutsidis and A. Gittens. Improved matrix algorithms via the Subsampled Randomized Hadamard Transform. ArXiv e-prints, March 2012. URL:
  8. Ricardo Cabral, Fernando De la Torre, João P Costeira, and Alexandre Bernardino. Unifying nuclear norm and bilinear factorization approaches for low-rank matrix decomposition. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 2488-2495. IEEE, 2013. Google Scholar
  9. Shouyuan Chen, Yang Liu, Michael Lyu, Irwin King, and Shengyu Zhang. Fast relative-error approximation algorithm for ridge regression. In 31st Conference on Uncertainty in Artificial Intelligence, 2015. Google Scholar
  10. Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression in input sparsity time. In STOC, 2013. Full version at Google Scholar
  11. M. B. Cohen, S. Elder, C. Musco, C. Musco, and M. Persu. Dimensionality Reduction for k-Means Clustering and Low Rank Approximation. ArXiv e-prints, October 2014. URL:
  12. Michael B. Cohen. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 278-287, 2016. Google Scholar
  13. Michael B. Cohen, Jelani Nelson, and David P. Woodruff. Optimal approximate matrix product in terms of stable rank. CoRR, abs/1507.02268, 2015. Google Scholar
  14. P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Sampling algorithms for 𝓁₂ regression and applications. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1127-1136, 2006. Google Scholar
  15. P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlos. Faster least squares approximation, Technical Report, arXiv:0710.1435, 2007. URL:
  16. Petros Drineas, Michael W. Mahoney, Malik Magdon-Ismail, and David P. Woodruff. Fast approximation of matrix coherence and statistical leverage. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012, 2012. Google Scholar
  17. Ahmed El Alaoui and Michael W. Mahoney. Fast randomized kernel methods with statistical guarantees. stat, 1050:2, 2014. Google Scholar
  18. R. Frostig, R. Ge, S. M. Kakade, and A. Sidford. Competing with the Empirical Risk Minimizer in a Single Pass. ArXiv e-prints, December 2014. Appeared in COLT 2015. URL:
  19. R. Frostig, R. Ge, S. M. Kakade, and A. Sidford. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization. In International Conference on Machine Learning (ICML), 2015. Google Scholar
  20. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer, 2013. Google Scholar
  21. Yichao Lu, Paramveer Dhillon, Dean P Foster, and Lyle Ungar. Faster ridge regression via the subsampled randomized hadamard transform. In Advances in Neural Information Processing Systems, pages 369-377, 2013. Google Scholar
  22. Michael W. Mahoney. Randomized algorithms for matrices and data. Found. Trends Mach. Learn., 3(2):123-224, February 2011. URL:
  23. Xiangrui Meng and Michael W. Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In STOC, pages 91-100, 2013. Google Scholar
  24. Jelani Nelson and Huy L. Nguyen. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In FOCS, pages 117-126, 2013. Google Scholar
  25. Mert Pilanci and Martin J. Wainwright. Randomized sketches of convex programs with sharp guarantees. CoRR, abs/1404.7203, 2014. URL:
  26. T. Sarlós. Improved approximation algorithms for large matrices via random projections. In IEEE Symposium on Foundations of Computer Science (FOCS), 2006. Google Scholar
  27. Nathan Srebro and Adi Shraibman. Rank, trace-norm and max-norm. In Learning Theory, pages 545-560. Springer, 2005. Google Scholar
  28. Joel Tropp. Improved analysis of the subsampled randomized Hadamard transform. Adv. Adapt. Data Anal., Special Issue, "Sparse Representation of Data and Images", 2011. Google Scholar
  29. M. Udell, C. Horn, R. Zadeh, and S. Boyd. ArXiv e-prints, October 2014.
  30. David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10(1–2):1-157, 2014. URL:
  31. Jiyan Yang, Xiangrui Meng, and M. W. Mahoney. Implementing randomized matrix algorithms in parallel and distributed environments. Proceedings of the IEEE, 104(1):58-92, Jan 2016. URL:
  32. Y. Yang, M. Pilanci, and M. J. Wainwright. Randomized sketches for kernels: Fast and optimal non-parametric regression. ArXiv e-prints, January 2015. URL:
  33. Dean Foster Yichao Lu, Paramveer Dhillon and Lyle Ungar. Faster ridge regression via the subsampled randomized hadamard transform. In Proceedings of the Neural Information Processing Systems (NIPS) Conference, 2013. Google Scholar