An Improved Analysis of the ER-SpUD Dictionary Learning Algorithm

Blasiok, Jaroslaw; Nelson, Jelani

doi:10.4230/LIPIcs.ICALP.2016.44

File

Subject Classification

Keywords

dictionary learning
stochastic processes
generic chaining

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

In dictionary learning we observe Y = AX + E for some Y in R^{n*p}, A in R^{m*n}, and X in R^{m*p}, where p >= max{n, m}, and typically m >=n. The matrix Y is observed, and A, X, E are unknown. Here E is a "noise" matrix of small norm, and X is column-wise sparse. The matrix A is referred to as a dictionary, and its columns as atoms. Then, given some small number p of samples, i.e. columns of Y , the goal is to learn the dictionary A up to small error, as well as the coefficient matrix X. In applications one could for example think of each column of Y as a distinct image in a database. The motivation is that in many applications data is expected to sparse when represented by atoms in the "right" dictionary A (e.g. images in the Haar wavelet basis), and the goal is to learn A from the data to then use it for other applications. Recently, the work of [Spielman/Wang/Wright, COLT'12] proposed the dictionary learning algorithm ER-SpUD with provable guarantees when E = 0 and m = n. That work showed that if X has independent entries with an expected Theta n non-zeroes per column for 1/n <~ Theta <~ 1/sqrt(n), and with non-zero entries being subgaussian, then for p >~ n^2 log^2 n with high probability ER-SpUD outputs matrices A', X' which equal A, X up to permuting and scaling columns (resp. rows) of A (resp. X). They conjectured that p >~ n log n suffices, which they showed was information theoretically necessary for any algorithm to succeed when Theta =~ 1/n. Significant progress toward showing that p >~ n log^4 n might suffice was later obtained in [Luh/Vu, FOCS'15]. In this work, we show that for a slight variant of ER-SpUD, p >~ n log(n/delta) samples suffice for successful recovery with probability 1 - delta. We also show that without our slight variation made to ER-SpUD, p >~ n^{1.99} samples are required even to learn A, X with a small success probability of 1/ poly(n). This resolves the main conjecture of [Spielman/Wang/Wright, COLT'12], and contradicts a result of [Luh/Vu, FOCS'15], which claimed that p >~ n log^4 n guarantees high probability of success for the original ER-SpUD algorithm.

Cite As Get BibTex

Jaroslaw Blasiok and Jelani Nelson. An Improved Analysis of the ER-SpUD Dictionary Learning Algorithm. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 55, pp. 44:1-44:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ICALP.2016.44

Author Details

Jaroslaw Blasiok

Jelani Nelson

References

Radosław Adamczak. A note on the sample complexity of the Er-SpUD algorithm by Spielman, Wang and Wright for exact recovery of sparsely used dictionaries. CoRR, abs/1601.02049, 2016.
Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli, and Rashish Tandon. Learning sparsely used overcomplete dictionaries. In Proceedings of The 27th Conference on Learning Theory (COLT), pages 123-137, 2014.
M. Aharon, M. Elad, and A. Bruckstein. SVDD: An algorithm for designing overcomplete dictionaries for sparse representation. Trans. Sig. Proc., 54(11):4311-4322, November 2006.
Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. More algorithms for provable dictionary learning. CoRR, abs/1401.0579, 2014.
Sanjeev Arora, Rong Ge, and Ankur Moitra. New algorithms for learning incoherent and overcomplete dictionaries. In Proceedings of The 27th Conference on Learning Theory (COLT), pages 779-806, 2014.
Sanjeev Arora, Rong Ge, Ankur Moitra, and Sushant Sachdeva. Provable ICA with unknown gaussian noise, and implications for gaussian mixtures and autoencoders. Algorithmica, 72(1):215-236, 2015.
Boaz Barak, Jonathan A. Kelner, and David Steurer. Dictionary learning and tensor decomposition via the sum-of-squares method. In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC), pages 143-151, 2015.
Mikhail Belkin, Luis Rademacher, and James R. Voss. Blind signal separation in the presence of gaussian noise. In Proceedings of the 26th Annual Conference on Learning Theory (COLT), pages 270-287, 2013.
Stephane Boucheron, Gabor Lugosi, and Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
Ori Bryt and Michael Elad. Compression of facial images using the K-SVD algorithm. J. Visual Communication and Image Representation, 19(4):270-282, 2008.
Sjoerd Dirksen. Tail bounds via generic chaining. Electron. J. Probab., 20(53):1-29, 2015.
Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736-3745, 2006.
Alan M. Frieze, Mark Jerrum, and Ravi Kannan. Learning linear transformations. In Proceedings of the 37th Annual Symposium on Foundations of Computer Science (FOCS), pages 359-368, 1996.
Navin Goyal, Santosh Vempala, and Ying Xiao. Fourier PCA and robust tensor decomposition. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 584-593, 2014.
Michel Ledoux and Michel Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Springer-Verlag, 1991.
Yuanqing Li, Zhu Liang Yu, Ning Bi, Yong Xu, Zhenghui Gu, and S.-I. Amari. Sparse representation for brain signal processing: A tutorial on methods and applications. Signal Processing Magazine, IEEE, 31(3):96-106, May 2014. URL: http://dx.doi.org/10.1109/MSP.2013.2296790.
Kyle Luh and Van Vu. Random matrices: l₁ concentration and dictionary learning with few samples. In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 1409-1425, 2015.
Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11:19-60, 2010.
Julien Mairal, Francis R. Bach, and Jean Ponce. Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision, 8(2-3):85-283, 2014.
Julien Mairal, Francis R. Bach, Jean Ponce, Guillermo Sapiro, and Andrew Zisserman. Supervised dictionary learning. In Proceedings of the 22nd Annual Conference on Advances in Neural Information Processing Systems (NIPS), pages 1033-1040, 2008.
Julien Mairal, Francis R. Bach, Jean Ponce, Guillermo Sapiro, and Andrew Zisserman. Non-local sparse models for image restoration. In IEEE 12th International Conference on Computer Vision (ICCV), pages 2272-2279, 2009.
Phong Q. Nguyen and Oded Regev. Learning a parallelepiped: Cryptanalysis of GGH and NTRU signatures. J. Cryptology, 22(2):139-160, 2009.
Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y. Ng. Self-taught learning: transfer learning from unlabeled data. In Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML), pages 759-766, 2007.
Daniel A. Spielman, Huan Wang, and John Wright. Exact recovery of sparsely-used dictionaries. In The 25th Annual Conference on Learning Theory (COLT), pages 37.1-37.18, 2012. Full version: URL: http://arxiv.org/abs/1206.5882v1.
Ju Sun, Qing Qu, and John Wright. Complete dictionary recovery over the sphere. CoRR, abs/1504.06785, 2015.
Michel Talagrand. Upper and lower bounds for stochastic processes: modern methods and classical problems. Springer, 2014.
Santosh Vempala and Ying Xiao. Max vs Min: Tensor decomposition and ICA with nearly linear sample complexity. In Proceedings of The 28th Conference on Learning Theory (COLT), pages 1710-1723, 2015.

An Improved Analysis of the ER-SpUD Dictionary Learning Algorithm

Authors Jaroslaw Blasiok, Jelani Nelson

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message