Learning Arithmetic Formulas in the Presence of Noise: A General Framework and Applications to Unsupervised Learning

Authors Pritam Chandra, Ankit Garg, Neeraj Kayal, Kunal Mittal, Tanmay Sinha

Thumbnail PDF


  • Filesize: 0.78 MB
  • 19 pages

Document Identifiers

Author Details

Pritam Chandra
  • Microsoft Research, Bangalore, India
Ankit Garg
  • Microsoft Research, Bangalore, India
Neeraj Kayal
  • Microsoft Research, Bangalore, India
Kunal Mittal
  • Princeton University, NJ, USA
Tanmay Sinha
  • Microsoft Research, Bangalore, India

Cite AsGet BibTex

Pritam Chandra, Ankit Garg, Neeraj Kayal, Kunal Mittal, and Tanmay Sinha. Learning Arithmetic Formulas in the Presence of Noise: A General Framework and Applications to Unsupervised Learning. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 25:1-25:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


We present a general framework for designing efficient algorithms for unsupervised learning problems, such as mixtures of Gaussians and subspace clustering. Our framework is based on a meta algorithm that learns arithmetic formulas in the presence of noise, using lower bounds. This builds upon the recent work of Garg, Kayal and Saha (FOCS '20), who designed such a framework for learning arithmetic formulas without any noise. A key ingredient of our meta algorithm is an efficient algorithm for a novel problem called Robust Vector Space Decomposition. We show that our meta algorithm works well when certain matrices have sufficiently large smallest non-zero singular values. We conjecture that this condition holds for smoothed instances of our problems, and thus our framework would yield efficient algorithms for these problems in the smoothed setting.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
  • Arithmetic Circuits
  • Robust Vector Space Decomposition
  • Subspace Clustering
  • Mixtures of Gaussians


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. Tensor decompositions for learning latent variable models. Journal of Machine Learning Research, 15(1):2773-2832, January 2014. Google Scholar
  2. Nima Anari, Kuikui Liu, Shayan Oveis Gharan, and Cynthia Vinzant. Log-concave polynomials II: High-dimensional walks and an FPRAS for counting bases of a matroid. In STOC'19 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 1-12, 2019. Google Scholar
  3. Mitali Bafna, Jun-Ting Hsieh, Pravesh K Kothari, and Jeff Xu. Polynomial-time power-sum decomposition of polynomials. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 956-967. IEEE, 2022. Google Scholar
  4. Ainesh Bakshi, Ilias Diakonikolas, He Jia, Daniel M. Kane, Pravesh K. Kothari, and Santosh S. Vempala. Robustly learning mixtures of k arbitrary Gaussians. In STOC '22 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1234-1247. ACM, New York, 2022. Google Scholar
  5. Aditya Bhaskara, Aidao Chen, Aidan Perreault, and Aravindan Vijayaraghavan. Smoothed analysis in unsupervised learning via decoupling. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 582-610. IEEE, 2019. Google Scholar
  6. Pritam Chandra, Ankit Garg, Neeraj Kayal, Kunal Mittal, and Tanmay Sinha. Learning arithmetic formulas in the presence of noise: A general framework and applications to unsupervised learning, 2023. URL: https://arxiv.org/abs/2311.07284.
  7. Sitan Chen, Jerry Li, Yuanzhi Li, and Anru R. Zhang. Learning polynomial transformations via generalized tensor decompositions. In Barna Saha and Rocco A. Servedio, editors, Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 1671-1684. ACM, 2023. URL: https://doi.org/10.1145/3564246.3585209.
  8. Alexander L. Chistov, Gábor Ivanyos, and Marek Karpinski. Polynomial time algorithms for modules over finite dimensional algebras. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, ISSAC '97, Maui, Hawaii, USA, July 21-23, 1997, pages 68-74, 1997. Google Scholar
  9. Ankit Garg, Leonid Gurvits, Rafael Mendes de Oliveira, and Avi Wigderson. Operator scaling: Theory and applications. Found. Comput. Math., 20(2):223-290, 2020. URL: https://doi.org/10.1007/s10208-019-09417-z.
  10. Ankit Garg, Neeraj Kayal, and Chandan Saha. Learning sums of powers of low-degree polynomials in the non-degenerate case. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 889-899. IEEE, 2020. Open source version at URL: https://arxiv.org/abs/2004.06898.
  11. Rong Ge, Qingqing Huang, and Sham M. Kakade. Learning mixtures of gaussians in high dimensions. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC, pages 761-770, 2015. Open source version at URL: https://arxiv.org/abs/1503.00424.
  12. Erich Kaltofen, John P. May, Zhengfeng Yang, and Lihong Zhi. Approximate factorization of multivariate polynomials using singular value decomposition. Journal of Symboilic Computation, 43(5):359-376, 2008. URL: https://doi.org/10.1016/J.JSC.2007.11.005.
  13. Erich Kaltofen and Barry M. Trager. Computing with polynomials given by black boxes for their evaluations: Greatest common divisors, factorization, separation of numerators and denominators. Journal of Symboilic Computation, 9(3):301-320, 1990. Google Scholar
  14. Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455-500, 2009. Google Scholar
  15. Nimrod Megiddo and Arie Tamir. On the complexity of locating linear facilities in the plane. Operations Research Letters, 1(5):194-197, 1982. URL: https://doi.org/10.1016/0167-6377(82)90039-6.
  16. Lance Parsons, Ehtesham Haque, and Huan Liu. Subspace clustering for high dimensional data: A review. SIGKDD Explor. Newsl., 6(1):90-105, June 2004. Google Scholar
  17. Youming Qiao. Block diagonalization for adjoint action. Private communication, 2018. Google Scholar
  18. Wentao Qu, Xianchao Xiu, Huangyue Chen, and Lingchen Kong. A survey on high-dimensional subspace clustering. Mathematics, 11(2), 2023. URL: https://doi.org/10.3390/math11020436.
  19. Aravindan Vijayaraghavan. Efficient tensor decompositions. In Tim Roughgarden, editor, Beyond the Worst-Case Analysis of Algorithms, pages 424-444. Cambridge University Press, 2020. URL: https://arxiv.org/abs/2007.15589.
  20. Hanna Wallach. Topic modeling: Beyond bag-of-words. In ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning, volume 2006, pages 977-984, January 2006. URL: https://doi.org/10.1145/1143844.1143967.