Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and Hardness

Authors Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, David P. Woodruff



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2018.8.pdf
  • Filesize: 0.66 MB
  • 21 pages

Document Identifiers

Author Details

Cameron Musco
Praneeth Netrapalli
Aaron Sidford
Shashanka Ubaru
David P. Woodruff

Cite As Get BibTex

Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, and David P. Woodruff. Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and Hardness. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 8:1-8:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.ITCS.2018.8

Abstract

Understanding the singular value spectrum of an n x n matrix A is a fundamental task in countless numerical computation and data analysis applications. In matrix multiplication time, it is possible to perform a full SVD of A and directly compute the singular values \sigma_1,...,\sigma_n. However, little is known about algorithms that break this runtime barrier.

Using tools from stochastic trace estimation, polynomial approximation, and fast linear system solvers, we show how to efficiently isolate different ranges of A's spectrum and approximate the number of singular values in these ranges. We thus effectively compute an approximate histogram of the spectrum, which can stand in for the true singular values in many applications.

We use our histogram primitive to give the first algorithms for approximating a wide class of symmetric matrix norms and spectral sums faster than the best known runtime for matrix multiplication. For example, we show how to obtain a (1 + \epsilon) approximation to the Schatten 1-norm (i.e. the nuclear or trace norm) in just ~ O((nnz(A)n^{1/3} + n^2)\epsilon^{-3}) time for A with uniform row sparsity or \tilde O(n^{2.18} \epsilon^{-3}) time for dense matrices. The runtime scales smoothly for general Schatten-p norms, notably becoming \tilde O (p nnz(A) \epsilon^{-3}) for any real p >= 2.
	
At the same time, we show that the complexity of spectrum approximation is inherently tied to fast matrix multiplication in the small \epsilon regime. We use fine-grained complexity to give conditional lower bounds for spectrum approximation, showing that achieving milder \epsilon dependencies in our algorithms would imply triangle detection algorithms for general graphs running in faster than state of the art matrix multiplication time. This further implies, through a reduction of (Williams & William, 2010), that highly accurate spectrum approximation algorithms running in subcubic time can be used to give subcubic time matrix multiplication. As an application of our bounds, we show that precisely computing all effective resistances in a graph in less than matrix multiplication time is likely difficult, barring a major algorithmic breakthrough.

Subject Classification

Keywords
  • spectrum approximation
  • matrix norm computation
  • fine-grained complexity
  • linear algebra

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Zeyuan Allen-Zhu and Yuanzhi Li. Faster principal component regression and stable matrix Chebyshev approximation. Proceedings of the \nth\intcalcSub20171983 International Conference on Machine Learning (ICML), 2017. Google Scholar
  2. Orly Alter, Patrick O Brown, and David Botstein. Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97(18):10101-10106, 2000. Google Scholar
  3. Alexandr Andoni, Robert Krauthgamer, and Ilya P. Razenshteyn. Sketching and embedding are equivalent for norms. In Proceedings of the \nth\intcalcSub20151968 Annual ACM Symposium on Theory of Computing (STOC), pages 479-488, 2015. Google Scholar
  4. Haim Avron and Sivan Toledo. Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. Journal of the ACM, 58(2):8, 2011. Google Scholar
  5. Monami Banerjee and Nikhil R Pal. Feature selection with SVD entropy: some modification and extension. Information Sciences, 264:118-134, 2014. Google Scholar
  6. Walter Baur and Volker Strassen. The complexity of partial derivatives. Theoretical Computer Science, 22(3):317-330, 1983. Google Scholar
  7. Constantine Bekas, Alessandro Curioni, and I Fedulova. Low cost high performance uncertainty quantification. In Proceedings of the 2nd Workshop on High Performance Computational Finance, page 8. ACM, 2009. Google Scholar
  8. Christos Boutsidis, Petros Drineas, Prabhanjan Kambadur, and Anastasios Zouzias. A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix. Linear Algebra and its Applications, 533:95-119, 2017. Google Scholar
  9. Vladimir Braverman, Stephen R Chestnut, Robert Krauthgamer, and Lin F Yang. Sketches for matrix norms: Faster, smaller and more general. http://arxiv.org/abs/1609.05885, 2016. Google Scholar
  10. Emmanuel J. Candès and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111-119, 2012. Google Scholar
  11. Petre Caraiani. The predictive power of singular value decomposition entropy for stock market dynamics. Physica A: Statistical Mechanics and its Applications, 393:571-578, 2014. Google Scholar
  12. Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, and Aaron Sidford. Uniform sampling for matrix approximation. In Proceedings of the \nth\intcalcSub20152009 Conference on Innovations in Theoretical Computer Science (ITCS), pages 181-190, 2015. Google Scholar
  13. Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. Information-theoretic metric learning. In Proceedings of the \nth\intcalcSub20071983 International Conference on Machine Learning (ICML), pages 209-216, 2007. Google Scholar
  14. James Demmel. An arithmetic complexity lower bound for computing rational functions, with applications to linear algebra. submitted to SIMAX, 2013. Google Scholar
  15. Amit Deshpande, Madhur Tulsiani, and Nisheeth K. Vishnoi. Algorithms and hardness for subspace approximation. In Proceedings of the \nth\intcalcSub20111989 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 482-496, 2011. Google Scholar
  16. Edoardo Di Napoli, Eric Polizzi, and Yousef Saad. Efficient estimation of eigenvalue counts in an interval. Numerical Linear Algebra with Applications, 2016. Google Scholar
  17. Haishun Du, Qingpu Hu, Manman Jiang, and Fan Zhang. Two-dimensional principal component analysis based on Schatten p-norm for image feature extraction. Journal of Visual Communication and Image Representation, 32:55-62, 2015. Google Scholar
  18. JK Fitzsimons, MA Osborne, SJ Roberts, and JF Fitzsimons. Improved stochastic trace estimation using mutually unbiased bases. http://arxiv.org/abs/1608.00117, 2016. Google Scholar
  19. Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432-441, 2008. Google Scholar
  20. Roy Frostig, Cameron Musco, Christopher Musco, and Aaron Sidford. Principal component projection without principal component analysis. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 2349-2357. JMLR.org, 2016. URL: http://jmlr.org/proceedings/papers/v48/frostig16.html.
  21. François Le Gall and Florent Urrutia. Improved rectangular matrix multiplication using powers of the Coppersmith-Winograd tensor. http://arxiv.org/abs/1708.05622, 2017. Google Scholar
  22. Gene H. Golub and Charles F Van Loan. Matrix computations, volume 3. JHU Press, 2012. Google Scholar
  23. Alon Gonen, Francesco Orabona, and Shai Shalev-Shwartz. Solving ridge regression using sketched preconditioned SVRG. In Proceedings of the \nth\intcalcSub20161983 International Conference on Machine Learning (ICML), 2016. Google Scholar
  24. Rongbao Gu, Wei Xiong, and Xinjie Li. Does the singular value decomposition entropy have predictive power for stock market? evidence from the Shenzhen stock market. Physica A: Statistical Mechanics and its Applications, 439:103-113, 2015. Google Scholar
  25. Ivan Gutman. Total π-electron energy of benzenoid hydrocarbons. In Advances in the Theory of Benzenoid Hydrocarbons II, pages 29-63. Springer, 1992. Google Scholar
  26. Ivan Gutman. The energy of a graph: old and new results. In Algebraic Combinatorics and Applications, pages 196-211. Springer, 2001. Google Scholar
  27. Insu Han, Dmitry Malioutov, Haim Avron, and Jinwoo Shin. Approximating the spectral sums of large-scale matrices using Chebyshev approximations. SIAM Journal on Scientific Computing, 39(4), 2017. Google Scholar
  28. Moritz Hardt, Katrina Ligett, and Frank McSherry. A simple and practical algorithm for differentially private data release. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pages 2348-2356, 2012. Google Scholar
  29. Nicholas J Higham. Functions of matrices: theory and computation. SIAM, 2008. Google Scholar
  30. Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 19(2):433-450, 1990. Google Scholar
  31. Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating minimization. In Proceedings of the \nth\intcalcSub20131968 Annual ACM Symposium on Theory of Computing (STOC), pages 665-674, 2013. Google Scholar
  32. Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems \intcalcSub20131987 (NIPS), pages 315-323, 2013. Google Scholar
  33. Ian Jolliffe. Principal component analysis. Wiley Online Library, 2002. Google Scholar
  34. Ashish Khetan and Sewoong Oh. Matrix norm estimation from a few entries. In Advances in Neural Information Processing Systems \intcalcSub20171987 (NIPS), 2017. Google Scholar
  35. V. V. Klyuev and N. I. Kokovkin-Shcherbak. Minimization of the number of arithmetic operations in the solution of linear algebra systems of equations. USSR Computational Mathematics and Mathematical Physics, 5(1):25-43, 1965. Google Scholar
  36. N. I. Kokovkin-Shcherbak. Minimization of numerical algorithms for solving arbitrary systems of linear equations. Ukrainskii Matematicheskii Zhurnal, 22(4):494-502, 1970. Google Scholar
  37. Weihao Kong and Gregory Valiant. Spectrum estimation from samples. Annals of Statistics, 2016. Google Scholar
  38. J Kuczyński and H Woźniakowski. Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start. SIAM Journal on Matrix Analysis and Applications, 13(4):1094-1122, 1992. Google Scholar
  39. François Le Gall. Faster algorithms for rectangular matrix multiplication. In Proceedings of the \nth\intcalcSub20121959 Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2012. Google Scholar
  40. Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solving linear programs in õ(vrank) iterations and faster algorithms for maximum flow. In Proceedings of the \nth\intcalcSub20141959 Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 424-433, 2014. Google Scholar
  41. Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. In Proceedings of the \nth\intcalcSub20151959 Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2015. Google Scholar
  42. Chao Li and Gerome Miklau. Measuring the achievable error of query sets under differential privacy. http://arxiv.org/abs/1202.3399, 2012. Google Scholar
  43. Mu Li, Gary L. Miller, and Richard Peng. Iterative row sampling. In Proceedings of the \nth\intcalcSub20131959 Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2013. Google Scholar
  44. Yi Li, Huy L. Nguyen, and David P. Woodruff. On sketching matrix norms and the top singular vector. In Proceedings of the \nth\intcalcSub20141968 Annual ACM Symposium on Theory of Computing (STOC), pages 1562-1581, 2014. Google Scholar
  45. Yi Li and David P. Woodruff. On approximating functions of the singular values in a stream. In Proceedings of the \nth\intcalcSub20161968 Annual ACM Symposium on Theory of Computing (STOC), 2016. Google Scholar
  46. Yi Li and David P. Woodruff. Embeddings of schatten norms with applications to data streams. In Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl, editors, 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland, volume 80 of LIPIcs, pages 60:1-60:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2017.60.
  47. Lin Lin, Yousef Saad, and Chao Yang. Approximating spectral densities of large matrices. SIAM Review, 58(1):34-65, 2016. Google Scholar
  48. Yu Lu and Sahand N Negahban. Individualized rank aggregation using nuclear norm regularization. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing, pages 1473-1479. IEEE, 2015. Google Scholar
  49. Lei Luo, Jian Yang, Jinhui Chen, and Yicheng Gao. Schatten p-norm based matrix regression model for image classification. In Pattern Recognition. Springer, 2014. Google Scholar
  50. Aleksander Madry, Damian Straszak, and Jakub Tarnawski. Fast generation of random spanning trees and the effective resistance metric. In Proceedings of the \nth\intcalcSub20151989 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2019-2036, 2015. Google Scholar
  51. Jacques Morgenstern. Note on a lower bound on the linear complexity of the fast Fourier transform. Journal of the ACM (JACM), 20(2):305-306, 1973. Google Scholar
  52. Cameron Musco and Christopher Musco. Randomized block Krylov methods for stronger and faster approximate singular value decomposition. In Advances in Neural Information Processing Systems \intcalcSub20151987 (NIPS), pages 1396-1404, 2015. Google Scholar
  53. Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, and David P Woodruff. Spectrum approximation beyond fast matrix multiplication: Algorithms and hardness. http://arxiv.org/abs/1704.04163, 2017. Google Scholar
  54. Praneeth Netrapalli, UN Niranjan, Sujay Sanghavi, Animashree Anandkumar, and Prateek Jain. Non-convex robust PCA. In Advances in Neural Information Processing Systems \intcalcSub20141987 (NIPS), pages 1107-1115, 2014. Google Scholar
  55. Feiping Nie, Heng Huang, and Chris Ding. Low-rank matrix recovery via efficient Schatten p-norm minimization. In Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012. Google Scholar
  56. Beresford N Parlett. The symmetric eigenvalue problem. SIAM, 1998. Google Scholar
  57. Carl Edward Rasmussen. Gaussian processes in machine learning. In Advanced Lectures on Machine Learning, pages 63-71. Springer, 2004. Google Scholar
  58. Ran Raz and Amir Shpilka. Lower bounds for matrix product in bounded depth circuits with arbitrary gates. SIAM Journal on Computing, 32(2):488-513, 2003. Google Scholar
  59. Farbod Roosta-Khorasani and Uri Ascher. Improved bounds on sample size for implicit matrix trace estimators. Foundations of Computational Mathematics, 2015. Google Scholar
  60. Yousef Saad. Numerical Methods for Large Eigenvalue Problems: Revised Edition, volume 66. SIAM, 2011. Google Scholar
  61. Shai Shalev-Shwartz and Tong Zhang. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In Proceedings of the \nth\intcalcSub20141983 International Conference on Machine Learning (ICML), pages 64-72, 2014. Google Scholar
  62. Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resistances. In Proceedings of the \nth\intcalcSub20081968 Annual ACM Symposium on Theory of Computing (STOC), 2008. Google Scholar
  63. Daniel A Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proceedings of the \nth\intcalcSub20041968 Annual ACM Symposium on Theory of Computing (STOC), pages 81-90, 2004. Google Scholar
  64. Andreas Stathopoulos, Jesse Laeuchli, and Kostas Orginos. Hierarchical probing for estimating the trace of the matrix inverse on toroidal lattices. SIAM Journal on Scientific Computing, 35(5):S299-S322, 2013. Google Scholar
  65. Lloyd N. Trefethen and David Bau. Numerical Linear Algebra. SIAM, 1997. Google Scholar
  66. Shashanka Ubaru, Jie Chen, and Yousef Saad. Fast estimation of tr(f(a)) via stochastic Lanczos quadrature. SIAM Journal on Matrix Analysis and Applications (SIMAX), 2017. Google Scholar
  67. Shashanka Ubaru and Yousef Saad. Fast methods for estimating the numerical rank of large matrices. In Proceedings of the \nth\intcalcSub20161983 International Conference on Machine Learning (ICML), pages 468-477, 2016. Google Scholar
  68. Shashanka Ubaru, Yousef Saad, and Abd-Krim Seghouane. Fast estimation of approximate matrix ranks using spectral densities. Neural Computation, 2017. Google Scholar
  69. Roy Varshavsky, Assaf Gottlieb, Michal Linial, and David Horn. Novel unsupervised feature filtering of biological data. Bioinformatics, 22(14):e507-e513, 2006. Google Scholar
  70. Virginia Vassilevska Williams. Multiplying matrices faster than Coppersmith-Winograd. In Proceedings of the \nth\intcalcSub20121968 Annual ACM Symposium on Theory of Computing (STOC), 2012. Google Scholar
  71. Virginia Vassilevska Williams. Hardness of easy problems: Basing hardness on popular conjectures such as the strong exponential time hypothesis (invited talk). In 10th International Symposium on Parameterized and Exact Computation, IPEC 2015, September 16-18, 2015, Patras, Greece, pages 17-29, 2015. Google Scholar
  72. Virginia Vassilevska Williams and Ryan Williams. Subcubic equivalences between path, matrix and triangle problems. In Proceedings of the \nth\intcalcSub20101959 Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 645-654, 2010. Google Scholar
  73. Shmuel Winograd. Arithmetic Complexity of Computations. CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, 1987. URL: http://dx.doi.org/10.1137/1.9781611970364.
  74. David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1-2):1-157, 2014. Google Scholar
  75. Y. Xie, Y. Qu, D. Tao, W. Wu, Q. Yuan, and W. Zhang. Hyperspectral image restoration via iteratively regularized weighted Schatten p-norm minimization. IEEE Transactions on Geoscience and Remote Sensing, PP(99):1-18, 2016. Google Scholar
  76. Yuan Xie, Shuhang Gu, Yan Liu, Wangmeng Zuo, Wensheng Zhang, and Lei Zhang. Weighted Schatten p-norm minimization for image denoising and background subtraction. IEEE transactions on Image Processing, 25(10):4842-4857, 2016. Google Scholar
  77. Yuchen Zhang, Martin J Wainwright, and Michael I Jordan. Distributed estimation of generalized matrix rank: Efficient algorithms and lower bounds. In Proceedings of the \nth\intcalcSub20151983 International Conference on Machine Learning (ICML), 2015. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail