Recovering Structured Probability Matrices

Authors Qingqing Huang, Sham M. Kakade, Weihao Kong, Gregory Valiant



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2018.46.pdf
  • Filesize: 0.53 MB
  • 14 pages

Document Identifiers

Author Details

Qingqing Huang
Sham M. Kakade
Weihao Kong
Gregory Valiant

Cite AsGet BibTex

Qingqing Huang, Sham M. Kakade, Weihao Kong, and Gregory Valiant. Recovering Structured Probability Matrices. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 46:1-46:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ITCS.2018.46

Abstract

We consider the problem of accurately recovering a matrix B of size M by M, which represents a probability distribution over M^2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal reconstruction algorithms? When can accurate reconstruction be accomplished in the sparse data regime? This basic problem lies at the core of a number of questions that are currently being considered by different communities, including building recommendation systems and collaborative filtering in the sparse data regime, community detection in sparse random graphs, learning structured models such as topic models or hidden Markov models, and the efforts from the natural language processing community to compute "word embeddings". Many aspects of this problem---both in terms of learning and property testing/estimation and on both the algorithmic and information theoretic sides---remain open. Our results apply to the setting where B has a low rank structure. For this setting, we propose an efficient (and practically viable) algorithm that accurately recovers the underlying M by M matrix using O(M) samples} (where we assume the rank is a constant). This linear sample complexity is optimal, up to constant factors, in an extremely strong sense: even testing basic properties of the underlying matrix (such as whether it has rank 1 or 2) requires Omega(M) samples. Additionally, we provide an even stronger lower bound showing that distinguishing whether a sequence of observations were drawn from the uniform distribution over M observations versus being generated by a well-conditioned Hidden Markov Model with two hidden states requires Omega(M) observations, while our positive results for recovering B immediately imply that Omega(M) observations suffice to learn such an HMM. This lower bound precludes sublinear-sample hypothesis tests for basic properties, such as identity or uniformity, as well as sublinear sample estimators for quantities such as the entropy rate of HMMs.
Keywords
  • Random matrices
  • matrix recovery
  • stochastic block model
  • Hidden Markov Models

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Emmanuel Abbe, Afonso S Bandeira, and Georgina Hall. Exact recovery in the stochastic block model. arXiv preprint arXiv:1405.3267, 2014. Google Scholar
  2. Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms. CoRR, abs/1503.00609, 2015. URL: http://arxiv.org/abs/1503.00609.
  3. Emmanuel Abbe and Colin Sandon. Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic bp, and the information-computation gap. CoRR, abs/1512.09080, 2015. URL: http://arxiv.org/abs/1512.09080.
  4. J. Acharya, H. Das, A. Jafarpour, A. Orlitsky, and S. Pan. Competitive closeness testing. In Conference on Learning Theory (COLT), 2011. Google Scholar
  5. J. Acharya, H. Das, A. Jafarpour, A. Orlitsky, and S. Pan. Competitive classification and closeness testing. In Conference on Learning Theory (COLT), 2012. Google Scholar
  6. Anima Anandkumar, Dean P. Foster, Daniel J. Hsu, Sham Kakade, and Yi-Kai Liu. A spectral algorithm for latent dirichlet allocation. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pages 926-934, 2012. URL: http://papers.nips.cc/paper/4637-a-spectral-algorithm-for-latent-dirichlet-allocation.
  7. Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. Tensor decompositions for learning latent variable models. Journal of Machine Learning Research, 15:2773-2832, 2014. URL: http://jmlr.org/papers/v15/anandkumar14b.html.
  8. Sanjeev Arora, Rong Ge, Ravindran Kannan, and Ankur Moitra. Computing a nonnegative matrix factorization-provably. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 145-162. ACM, 2012. Google Scholar
  9. Sanjeev Arora, Rong Ge, and Ankur Moitra. Learning topic models-going beyond svd. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on, pages 1-10. IEEE, 2012. Google Scholar
  10. Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. Random walks on context spaces: Towards an explanation of the mysteries of semantic word embeddings. CoRR, abs/1502.03520, 2015. URL: http://arxiv.org/abs/1502.03520.
  11. Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. Random walks on context spaces: Towards an explanation of the mysteries of semantic word embeddings. CoRR, abs/1502.03520, 2015. URL: http://arxiv.org/abs/1502.03520.
  12. T. Batu, L. Fortnow, R. Rubinfeld, W. D. Smith, and P. White. Testing closeness of discrete distributions. Journal of the ACM (JACM), 60(1), 2013. Google Scholar
  13. T. Batu, R. Kumar, and R. Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Symposium on Theory of Computing (STOC), pages 381-390, 2004. Google Scholar
  14. Mikhail Belkin and Kaushik Sinha. Polynomial learning of distribution families. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 103-112. IEEE, 2010. Google Scholar
  15. Aditya Bhaskara, Moses Charikar, Ankur Moitra, and Aravindan Vijayaraghavan. Smoothed analysis of tensor decompositions. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pages 594-603. ACM, 2014. Google Scholar
  16. B. Bhattacharya and G. Valiant. Testing closeness with unequal sized samples. In Neural Information Processing Systems (NIPS), 2015. Google Scholar
  17. L. Birge. Estimating a density under order restrictions: Nonasymptotic minimax risk. Annals of Statistics, 15(3):995-1012, 1987. Google Scholar
  18. David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993-1022, 2003. Google Scholar
  19. J. T. Chang. Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency. Mathematical Biosciences, 137:51-73, 1996. Google Scholar
  20. Peter Chin, Anup Rao, and Van Vu. Stochastic block model and community detection in the sparse graphs: A spectral algorithm with optimal rate of recovery. CoRR, abs/1501.05021, 2015. URL: http://arxiv.org/abs/1501.05021.
  21. Sanjoy Dasgupta. Learning mixtures of gaussians. In Foundations of Computer Science, 1999. 40th Annual Symposium on, pages 634-644. IEEE, 1999. Google Scholar
  22. Chris H. Q. Ding, Tao Li, and Wei Peng. Nonnegative matrix factorization and probabilistic latent semantic indexing: Equivalence chi-square statistic, and a hybrid method. In Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, July 16-20, 2006, Boston, Massachusetts, USA, pages 342-347. AAAI Press, 2006. URL: http://www.aaai.org/Library/AAAI/2006/aaai06-055.php.
  23. Uriel Feige and Eran Ofek. Spectral techniques applied to sparse random graphs. Random Structures &Algorithms, 27(2):251-275, 2005. Google Scholar
  24. Joel Friedman, Jeff Kahn, and Endre Szemeredi. On the second eigenvalue of random regular graphs. In Proceedings of the twenty-first annual ACM symposium on Theory of computing, pages 587-598. ACM, 1989. Google Scholar
  25. Rong Ge, Qingqing Huang, and Sham M. Kakade. Learning mixtures of gaussians in high dimensions. In Proceedings of the Symposium on Theory of Computing, STOC 2015,, 2015. Google Scholar
  26. O. Goldreich and D. Ron. On testing expansion in bounded-degree graphs. In Technical Report TR00-020, Electronic Colloquium on Computational Complexity, 2000. Google Scholar
  27. S. Guha, A. McGregor, and S. Venkatasubramanian. Streaming and sublinear approximation of entropy and information distances. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2006. Google Scholar
  28. Thomas Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50-57. ACM, 1999. Google Scholar
  29. Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109-137, 1983. Google Scholar
  30. Daniel Hsu and Sham M Kakade. Learning mixtures of spherical gaussians: moment methods and spectral decompositions. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 11-20. ACM, 2013. Google Scholar
  31. Daniel Hsu, Sham M Kakade, and Tong Zhang. A spectral algorithm for learning hidden markov models. Journal of Computer and System Sciences, 78(5):1460-1480, 2012. Google Scholar
  32. Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant. Efficiently learning mixtures of two gaussians. In Proceedings of the 42nd ACM symposium on Theory of computing, pages 553-562. ACM, 2010. Google Scholar
  33. Raghunandan H Keshavan, Sewoong Oh, and Andrea Montanari. Matrix completion from a few entries. In Information Theory, 2009. ISIT 2009. IEEE International Symposium on, pages 324-328. IEEE, 2009. Google Scholar
  34. Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, and Pan Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935-20940, 2013. Google Scholar
  35. Can Le, Elizaveta Levina, and Roman Vershynin. Sparse random graphs: regularization and concentration of the laplacian. arXiv preprint arXiv:1502.03049, 2015. Google Scholar
  36. Can M. Le and Roman Vershynin. Concentration and regularization of random graphs. CoRR, abs/1506.00669, 2015. URL: http://arxiv.org/abs/1506.00669.
  37. Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. In Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 2177-2185, 2014. URL: http://papers.nips.cc/paper/5477-neural-word-embedding-as-implicit-matrix-factorization.
  38. Laurent Massoulié. Community detection thresholds and the weak ramanujan property. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pages 694-703. ACM, 2014. Google Scholar
  39. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013. URL: http://arxiv.org/abs/1301.3781.
  40. Ankur Moitra and Gregory Valiant. Settling the polynomial learnability of mixtures of gaussians. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 93-102. IEEE, 2010. Google Scholar
  41. E. Mossel and S. Roch. Learning nonsingular phylogenies and hidden Markov models. Annals of Applied Probability, 16(2):583-614, 2006. Google Scholar
  42. Elchanan Mossel, Joe Neeman, and Allan Sly. Stochastic block models and reconstruction. arXiv preprint arXiv:1202.1499, 2012. URL: http://arxiv.org/abs/1202.1499.
  43. Elchanan Mossel, Joe Neeman, and Allan Sly. Consistency thresholds for binary symmetric block models. CoRR, abs/1407.1591, 2014. URL: http://arxiv.org/abs/1407.1591.
  44. S. on Chan, I. Diakonikolas, G. Valiant, and P. Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1193-1203, 2014. Google Scholar
  45. L. Paninski. Estimating entropy on m bins given fewer than m samples. IEEE Transactions on Information Theory, 50(9):2200-2203, 2004. Google Scholar
  46. Christos H Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent semantic indexing: A probabilistic analysis. In Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 159-168. ACM, 1998. Google Scholar
  47. S. Raskhodnikova, D. Ron, A. Shpilka, and A. Smith. Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM Journal on Computing, 39(3):813-842, 2009. Google Scholar
  48. Karl Stratos, Michael Collins, and Daniel Hsu. Model-based word embeddings from decompositions of count matrices. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, 2015. Google Scholar
  49. Karl Stratos, Michael Collins Do-Kyum Kim, and Daniel Hsu. A spectral algorithm for learning class-based n-gram models of natural language. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, 2014. Google Scholar
  50. G. Valiant and P. Valiant. Estimating the unseen: an n/log n-sample estimator for entropy and support size, shown optimal via new clts. In Symposium on Theory of Computing (STOC), 2011. Google Scholar
  51. G. Valiant and P. Valiant. The power of linear estimators. In Symposium on Foundations of Computer Science (FOCS), 2011. Google Scholar
  52. G. Valiant and P. Valiant. Estimating the unseen: improved estimators for entropy and other properties. In Neural Information Processing Systems (NIPS), 2013. Google Scholar
  53. G. Valiant and P. Valiant. An automatic inequality prover and instance optimal identity testing. In IEEE Symposium on Foundations of Computer Science (FOCS), pages 51-60, 2014. Google Scholar
  54. Santosh Vempala and Grant Wang. A spectral algorithm for learning mixture models. Journal of Computer and System Sciences, 68(4):841-860, 2004. Google Scholar
  55. Anderson Y. Zhang and Harrison H. Zhou. Minimax rates of community detection in stochastic block models. CoRR, abs/1507.05313, 2015. URL: http://arxiv.org/abs/1507.05313,