Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and Hardness

Musco, Cameron; Netrapalli, Praneeth; Sidford, Aaron; Ubaru, Shashanka; Woodruff, David P.

doi:10.4230/LIPIcs.ITCS.2018.8

Abstract

Understanding the singular value spectrum of an n x n matrix A is a fundamental task in countless numerical computation and data analysis applications. In matrix multiplication time, it is possible to perform a full SVD of A and directly compute the singular values \sigma_1,...,\sigma_n. However, little is known about algorithms that break this runtime barrier.

Using tools from stochastic trace estimation, polynomial approximation, and fast linear system solvers, we show how to efficiently isolate different ranges of A's spectrum and approximate the number of singular values in these ranges. We thus effectively compute an approximate histogram of the spectrum, which can stand in for the true singular values in many applications.

We use our histogram primitive to give the first algorithms for approximating a wide class of symmetric matrix norms and spectral sums faster than the best known runtime for matrix multiplication. For example, we show how to obtain a (1 + \epsilon) approximation to the Schatten 1-norm (i.e. the nuclear or trace norm) in just ~ O((nnz(A)n^{1/3} + n^2)\epsilon^{-3}) time for A with uniform row sparsity or \tilde O(n^{2.18} \epsilon^{-3}) time for dense matrices. The runtime scales smoothly for general Schatten-p norms, notably becoming \tilde O (p nnz(A) \epsilon^{-3}) for any real p >= 2.
	
At the same time, we show that the complexity of spectrum approximation is inherently tied to fast matrix multiplication in the small \epsilon regime. We use fine-grained complexity to give conditional lower bounds for spectrum approximation, showing that achieving milder \epsilon dependencies in our algorithms would imply triangle detection algorithms for general graphs running in faster than state of the art matrix multiplication time. This further implies, through a reduction of (Williams & William, 2010), that highly accurate spectrum approximation algorithms running in subcubic time can be used to give subcubic time matrix multiplication. As an application of our bounds, we show that precisely computing all effective resistances in a graph in less than matrix multiplication time is likely difficult, barring a major algorithmic breakthrough.

Zeyuan Allen-Zhu and Yuanzhi Li. Faster principal component regression and stable matrix Chebyshev approximation. Proceedings of the \nth\intcalcSub20171983 International Conference on Machine Learning (ICML), 2017.
Orly Alter, Patrick O Brown, and David Botstein. Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97(18):10101-10106, 2000.
Alexandr Andoni, Robert Krauthgamer, and Ilya P. Razenshteyn. Sketching and embedding are equivalent for norms. In Proceedings of the \nth\intcalcSub20151968 Annual ACM Symposium on Theory of Computing (STOC), pages 479-488, 2015.
Haim Avron and Sivan Toledo. Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. Journal of the ACM, 58(2):8, 2011.
Monami Banerjee and Nikhil R Pal. Feature selection with SVD entropy: some modification and extension. Information Sciences, 264:118-134, 2014.
Walter Baur and Volker Strassen. The complexity of partial derivatives. Theoretical Computer Science, 22(3):317-330, 1983.
Constantine Bekas, Alessandro Curioni, and I Fedulova. Low cost high performance uncertainty quantification. In Proceedings of the 2nd Workshop on High Performance Computational Finance, page 8. ACM, 2009.
Christos Boutsidis, Petros Drineas, Prabhanjan Kambadur, and Anastasios Zouzias. A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix. Linear Algebra and its Applications, 533:95-119, 2017.
Vladimir Braverman, Stephen R Chestnut, Robert Krauthgamer, and Lin F Yang. Sketches for matrix norms: Faster, smaller and more general. http://arxiv.org/abs/1609.05885, 2016.
Emmanuel J. Candès and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111-119, 2012.
Petre Caraiani. The predictive power of singular value decomposition entropy for stock market dynamics. Physica A: Statistical Mechanics and its Applications, 393:571-578, 2014.
Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, and Aaron Sidford. Uniform sampling for matrix approximation. In Proceedings of the \nth\intcalcSub20152009 Conference on Innovations in Theoretical Computer Science (ITCS), pages 181-190, 2015.
Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. Information-theoretic metric learning. In Proceedings of the \nth\intcalcSub20071983 International Conference on Machine Learning (ICML), pages 209-216, 2007.
James Demmel. An arithmetic complexity lower bound for computing rational functions, with applications to linear algebra. submitted to SIMAX, 2013.
Amit Deshpande, Madhur Tulsiani, and Nisheeth K. Vishnoi. Algorithms and hardness for subspace approximation. In Proceedings of the \nth\intcalcSub20111989 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 482-496, 2011.
Edoardo Di Napoli, Eric Polizzi, and Yousef Saad. Efficient estimation of eigenvalue counts in an interval. Numerical Linear Algebra with Applications, 2016.
Haishun Du, Qingpu Hu, Manman Jiang, and Fan Zhang. Two-dimensional principal component analysis based on Schatten p-norm for image feature extraction. Journal of Visual Communication and Image Representation, 32:55-62, 2015.
JK Fitzsimons, MA Osborne, SJ Roberts, and JF Fitzsimons. Improved stochastic trace estimation using mutually unbiased bases. http://arxiv.org/abs/1608.00117, 2016.
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432-441, 2008.
Roy Frostig, Cameron Musco, Christopher Musco, and Aaron Sidford. Principal component projection without principal component analysis. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 2349-2357. JMLR.org, 2016. URL: http://jmlr.org/proceedings/papers/v48/frostig16.html.
François Le Gall and Florent Urrutia. Improved rectangular matrix multiplication using powers of the Coppersmith-Winograd tensor. http://arxiv.org/abs/1708.05622, 2017.
Gene H. Golub and Charles F Van Loan. Matrix computations, volume 3. JHU Press, 2012.
Alon Gonen, Francesco Orabona, and Shai Shalev-Shwartz. Solving ridge regression using sketched preconditioned SVRG. In Proceedings of the \nth\intcalcSub20161983 International Conference on Machine Learning (ICML), 2016.
Rongbao Gu, Wei Xiong, and Xinjie Li. Does the singular value decomposition entropy have predictive power for stock market? evidence from the Shenzhen stock market. Physica A: Statistical Mechanics and its Applications, 439:103-113, 2015.
Ivan Gutman. Total π-electron energy of benzenoid hydrocarbons. In Advances in the Theory of Benzenoid Hydrocarbons II, pages 29-63. Springer, 1992.
Ivan Gutman. The energy of a graph: old and new results. In Algebraic Combinatorics and Applications, pages 196-211. Springer, 2001.
Insu Han, Dmitry Malioutov, Haim Avron, and Jinwoo Shin. Approximating the spectral sums of large-scale matrices using Chebyshev approximations. SIAM Journal on Scientific Computing, 39(4), 2017.
Moritz Hardt, Katrina Ligett, and Frank McSherry. A simple and practical algorithm for differentially private data release. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pages 2348-2356, 2012.
Nicholas J Higham. Functions of matrices: theory and computation. SIAM, 2008.
Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 19(2):433-450, 1990.
Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating minimization. In Proceedings of the \nth\intcalcSub20131968 Annual ACM Symposium on Theory of Computing (STOC), pages 665-674, 2013.
Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems \intcalcSub20131987 (NIPS), pages 315-323, 2013.
Ian Jolliffe. Principal component analysis. Wiley Online Library, 2002.
Ashish Khetan and Sewoong Oh. Matrix norm estimation from a few entries. In Advances in Neural Information Processing Systems \intcalcSub20171987 (NIPS), 2017.
V. V. Klyuev and N. I. Kokovkin-Shcherbak. Minimization of the number of arithmetic operations in the solution of linear algebra systems of equations. USSR Computational Mathematics and Mathematical Physics, 5(1):25-43, 1965.
N. I. Kokovkin-Shcherbak. Minimization of numerical algorithms for solving arbitrary systems of linear equations. Ukrainskii Matematicheskii Zhurnal, 22(4):494-502, 1970.
Weihao Kong and Gregory Valiant. Spectrum estimation from samples. Annals of Statistics, 2016.
J Kuczyński and H Woźniakowski. Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start. SIAM Journal on Matrix Analysis and Applications, 13(4):1094-1122, 1992.
François Le Gall. Faster algorithms for rectangular matrix multiplication. In Proceedings of the \nth\intcalcSub20121959 Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2012.
Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solving linear programs in õ(vrank) iterations and faster algorithms for maximum flow. In Proceedings of the \nth\intcalcSub20141959 Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 424-433, 2014.
Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. In Proceedings of the \nth\intcalcSub20151959 Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2015.
Chao Li and Gerome Miklau. Measuring the achievable error of query sets under differential privacy. http://arxiv.org/abs/1202.3399, 2012.
Mu Li, Gary L. Miller, and Richard Peng. Iterative row sampling. In Proceedings of the \nth\intcalcSub20131959 Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2013.
Yi Li, Huy L. Nguyen, and David P. Woodruff. On sketching matrix norms and the top singular vector. In Proceedings of the \nth\intcalcSub20141968 Annual ACM Symposium on Theory of Computing (STOC), pages 1562-1581, 2014.
Yi Li and David P. Woodruff. On approximating functions of the singular values in a stream. In Proceedings of the \nth\intcalcSub20161968 Annual ACM Symposium on Theory of Computing (STOC), 2016.
Yi Li and David P. Woodruff. Embeddings of schatten norms with applications to data streams. In Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl, editors, 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland, volume 80 of LIPIcs, pages 60:1-60:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2017.60.
Lin Lin, Yousef Saad, and Chao Yang. Approximating spectral densities of large matrices. SIAM Review, 58(1):34-65, 2016.
Yu Lu and Sahand N Negahban. Individualized rank aggregation using nuclear norm regularization. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing, pages 1473-1479. IEEE, 2015.
Lei Luo, Jian Yang, Jinhui Chen, and Yicheng Gao. Schatten p-norm based matrix regression model for image classification. In Pattern Recognition. Springer, 2014.
Aleksander Madry, Damian Straszak, and Jakub Tarnawski. Fast generation of random spanning trees and the effective resistance metric. In Proceedings of the \nth\intcalcSub20151989 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2019-2036, 2015.
Jacques Morgenstern. Note on a lower bound on the linear complexity of the fast Fourier transform. Journal of the ACM (JACM), 20(2):305-306, 1973.
Cameron Musco and Christopher Musco. Randomized block Krylov methods for stronger and faster approximate singular value decomposition. In Advances in Neural Information Processing Systems \intcalcSub20151987 (NIPS), pages 1396-1404, 2015.
Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, and David P Woodruff. Spectrum approximation beyond fast matrix multiplication: Algorithms and hardness. http://arxiv.org/abs/1704.04163, 2017.
Praneeth Netrapalli, UN Niranjan, Sujay Sanghavi, Animashree Anandkumar, and Prateek Jain. Non-convex robust PCA. In Advances in Neural Information Processing Systems \intcalcSub20141987 (NIPS), pages 1107-1115, 2014.
Feiping Nie, Heng Huang, and Chris Ding. Low-rank matrix recovery via efficient Schatten p-norm minimization. In Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.
Beresford N Parlett. The symmetric eigenvalue problem. SIAM, 1998.
Carl Edward Rasmussen. Gaussian processes in machine learning. In Advanced Lectures on Machine Learning, pages 63-71. Springer, 2004.
Ran Raz and Amir Shpilka. Lower bounds for matrix product in bounded depth circuits with arbitrary gates. SIAM Journal on Computing, 32(2):488-513, 2003.
Farbod Roosta-Khorasani and Uri Ascher. Improved bounds on sample size for implicit matrix trace estimators. Foundations of Computational Mathematics, 2015.
Yousef Saad. Numerical Methods for Large Eigenvalue Problems: Revised Edition, volume 66. SIAM, 2011.
Shai Shalev-Shwartz and Tong Zhang. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In Proceedings of the \nth\intcalcSub20141983 International Conference on Machine Learning (ICML), pages 64-72, 2014.
Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resistances. In Proceedings of the \nth\intcalcSub20081968 Annual ACM Symposium on Theory of Computing (STOC), 2008.
Daniel A Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proceedings of the \nth\intcalcSub20041968 Annual ACM Symposium on Theory of Computing (STOC), pages 81-90, 2004.
Andreas Stathopoulos, Jesse Laeuchli, and Kostas Orginos. Hierarchical probing for estimating the trace of the matrix inverse on toroidal lattices. SIAM Journal on Scientific Computing, 35(5):S299-S322, 2013.
Lloyd N. Trefethen and David Bau. Numerical Linear Algebra. SIAM, 1997.
Shashanka Ubaru, Jie Chen, and Yousef Saad. Fast estimation of tr(f(a)) via stochastic Lanczos quadrature. SIAM Journal on Matrix Analysis and Applications (SIMAX), 2017.
Shashanka Ubaru and Yousef Saad. Fast methods for estimating the numerical rank of large matrices. In Proceedings of the \nth\intcalcSub20161983 International Conference on Machine Learning (ICML), pages 468-477, 2016.
Shashanka Ubaru, Yousef Saad, and Abd-Krim Seghouane. Fast estimation of approximate matrix ranks using spectral densities. Neural Computation, 2017.
Roy Varshavsky, Assaf Gottlieb, Michal Linial, and David Horn. Novel unsupervised feature filtering of biological data. Bioinformatics, 22(14):e507-e513, 2006.
Virginia Vassilevska Williams. Multiplying matrices faster than Coppersmith-Winograd. In Proceedings of the \nth\intcalcSub20121968 Annual ACM Symposium on Theory of Computing (STOC), 2012.
Virginia Vassilevska Williams. Hardness of easy problems: Basing hardness on popular conjectures such as the strong exponential time hypothesis (invited talk). In 10th International Symposium on Parameterized and Exact Computation, IPEC 2015, September 16-18, 2015, Patras, Greece, pages 17-29, 2015.
Virginia Vassilevska Williams and Ryan Williams. Subcubic equivalences between path, matrix and triangle problems. In Proceedings of the \nth\intcalcSub20101959 Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 645-654, 2010.
Shmuel Winograd. Arithmetic Complexity of Computations. CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, 1987. URL: http://dx.doi.org/10.1137/1.9781611970364.
David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1-2):1-157, 2014.
Y. Xie, Y. Qu, D. Tao, W. Wu, Q. Yuan, and W. Zhang. Hyperspectral image restoration via iteratively regularized weighted Schatten p-norm minimization. IEEE Transactions on Geoscience and Remote Sensing, PP(99):1-18, 2016.
Yuan Xie, Shuhang Gu, Yan Liu, Wangmeng Zuo, Wensheng Zhang, and Lei Zhang. Weighted Schatten p-norm minimization for image denoising and background subtraction. IEEE transactions on Image Processing, 25(10):4842-4857, 2016.
Yuchen Zhang, Martin J Wainwright, and Michael I Jordan. Distributed estimation of generalized matrix rank: Efficient algorithms and lower bounds. In Proceedings of the \nth\intcalcSub20151983 International Conference on Machine Learning (ICML), 2015.

Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and Hardness

Authors Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, David P. Woodruff

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message