An Improved Analysis of the ER-SpUD Dictionary Learning Algorithm

Authors Jaroslaw Blasiok, Jelani Nelson

Thumbnail PDF


  • Filesize: 0.59 MB
  • 14 pages

Document Identifiers

Author Details

Jaroslaw Blasiok
Jelani Nelson

Cite AsGet BibTex

Jaroslaw Blasiok and Jelani Nelson. An Improved Analysis of the ER-SpUD Dictionary Learning Algorithm. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 55, pp. 44:1-44:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


In dictionary learning we observe Y = AX + E for some Y in R^{n*p}, A in R^{m*n}, and X in R^{m*p}, where p >= max{n, m}, and typically m >=n. The matrix Y is observed, and A, X, E are unknown. Here E is a "noise" matrix of small norm, and X is column-wise sparse. The matrix A is referred to as a dictionary, and its columns as atoms. Then, given some small number p of samples, i.e. columns of Y , the goal is to learn the dictionary A up to small error, as well as the coefficient matrix X. In applications one could for example think of each column of Y as a distinct image in a database. The motivation is that in many applications data is expected to sparse when represented by atoms in the "right" dictionary A (e.g. images in the Haar wavelet basis), and the goal is to learn A from the data to then use it for other applications. Recently, the work of [Spielman/Wang/Wright, COLT'12] proposed the dictionary learning algorithm ER-SpUD with provable guarantees when E = 0 and m = n. That work showed that if X has independent entries with an expected Theta n non-zeroes per column for 1/n <~ Theta <~ 1/sqrt(n), and with non-zero entries being subgaussian, then for p >~ n^2 log^2 n with high probability ER-SpUD outputs matrices A', X' which equal A, X up to permuting and scaling columns (resp. rows) of A (resp. X). They conjectured that p >~ n log n suffices, which they showed was information theoretically necessary for any algorithm to succeed when Theta =~ 1/n. Significant progress toward showing that p >~ n log^4 n might suffice was later obtained in [Luh/Vu, FOCS'15]. In this work, we show that for a slight variant of ER-SpUD, p >~ n log(n/delta) samples suffice for successful recovery with probability 1 - delta. We also show that without our slight variation made to ER-SpUD, p >~ n^{1.99} samples are required even to learn A, X with a small success probability of 1/ poly(n). This resolves the main conjecture of [Spielman/Wang/Wright, COLT'12], and contradicts a result of [Luh/Vu, FOCS'15], which claimed that p >~ n log^4 n guarantees high probability of success for the original ER-SpUD algorithm.
  • dictionary learning
  • stochastic processes
  • generic chaining


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Radosław Adamczak. A note on the sample complexity of the Er-SpUD algorithm by Spielman, Wang and Wright for exact recovery of sparsely used dictionaries. CoRR, abs/1601.02049, 2016. Google Scholar
  2. Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli, and Rashish Tandon. Learning sparsely used overcomplete dictionaries. In Proceedings of The 27th Conference on Learning Theory (COLT), pages 123-137, 2014. Google Scholar
  3. M. Aharon, M. Elad, and A. Bruckstein. SVDD: An algorithm for designing overcomplete dictionaries for sparse representation. Trans. Sig. Proc., 54(11):4311-4322, November 2006. Google Scholar
  4. Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. More algorithms for provable dictionary learning. CoRR, abs/1401.0579, 2014. Google Scholar
  5. Sanjeev Arora, Rong Ge, and Ankur Moitra. New algorithms for learning incoherent and overcomplete dictionaries. In Proceedings of The 27th Conference on Learning Theory (COLT), pages 779-806, 2014. Google Scholar
  6. Sanjeev Arora, Rong Ge, Ankur Moitra, and Sushant Sachdeva. Provable ICA with unknown gaussian noise, and implications for gaussian mixtures and autoencoders. Algorithmica, 72(1):215-236, 2015. Google Scholar
  7. Boaz Barak, Jonathan A. Kelner, and David Steurer. Dictionary learning and tensor decomposition via the sum-of-squares method. In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC), pages 143-151, 2015. Google Scholar
  8. Mikhail Belkin, Luis Rademacher, and James R. Voss. Blind signal separation in the presence of gaussian noise. In Proceedings of the 26th Annual Conference on Learning Theory (COLT), pages 270-287, 2013. Google Scholar
  9. Stephane Boucheron, Gabor Lugosi, and Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013. Google Scholar
  10. Ori Bryt and Michael Elad. Compression of facial images using the K-SVD algorithm. J. Visual Communication and Image Representation, 19(4):270-282, 2008. Google Scholar
  11. Sjoerd Dirksen. Tail bounds via generic chaining. Electron. J. Probab., 20(53):1-29, 2015. Google Scholar
  12. Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736-3745, 2006. Google Scholar
  13. Alan M. Frieze, Mark Jerrum, and Ravi Kannan. Learning linear transformations. In Proceedings of the 37th Annual Symposium on Foundations of Computer Science (FOCS), pages 359-368, 1996. Google Scholar
  14. Navin Goyal, Santosh Vempala, and Ying Xiao. Fourier PCA and robust tensor decomposition. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 584-593, 2014. Google Scholar
  15. Michel Ledoux and Michel Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Springer-Verlag, 1991. Google Scholar
  16. Yuanqing Li, Zhu Liang Yu, Ning Bi, Yong Xu, Zhenghui Gu, and S.-I. Amari. Sparse representation for brain signal processing: A tutorial on methods and applications. Signal Processing Magazine, IEEE, 31(3):96-106, May 2014. URL:
  17. Kyle Luh and Van Vu. Random matrices: l₁ concentration and dictionary learning with few samples. In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 1409-1425, 2015. Google Scholar
  18. Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11:19-60, 2010. Google Scholar
  19. Julien Mairal, Francis R. Bach, and Jean Ponce. Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision, 8(2-3):85-283, 2014. Google Scholar
  20. Julien Mairal, Francis R. Bach, Jean Ponce, Guillermo Sapiro, and Andrew Zisserman. Supervised dictionary learning. In Proceedings of the 22nd Annual Conference on Advances in Neural Information Processing Systems (NIPS), pages 1033-1040, 2008. Google Scholar
  21. Julien Mairal, Francis R. Bach, Jean Ponce, Guillermo Sapiro, and Andrew Zisserman. Non-local sparse models for image restoration. In IEEE 12th International Conference on Computer Vision (ICCV), pages 2272-2279, 2009. Google Scholar
  22. Phong Q. Nguyen and Oded Regev. Learning a parallelepiped: Cryptanalysis of GGH and NTRU signatures. J. Cryptology, 22(2):139-160, 2009. Google Scholar
  23. Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y. Ng. Self-taught learning: transfer learning from unlabeled data. In Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML), pages 759-766, 2007. Google Scholar
  24. Daniel A. Spielman, Huan Wang, and John Wright. Exact recovery of sparsely-used dictionaries. In The 25th Annual Conference on Learning Theory (COLT), pages 37.1-37.18, 2012. Full version: URL:
  25. Ju Sun, Qing Qu, and John Wright. Complete dictionary recovery over the sphere. CoRR, abs/1504.06785, 2015. Google Scholar
  26. Michel Talagrand. Upper and lower bounds for stochastic processes: modern methods and classical problems. Springer, 2014. Google Scholar
  27. Santosh Vempala and Ying Xiao. Max vs Min: Tensor decomposition and ICA with nearly linear sample complexity. In Proceedings of The 28th Conference on Learning Theory (COLT), pages 1710-1723, 2015. Google Scholar