Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Authors Ioannis Panageas, Georgios Piliouras



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2017.2.pdf
  • Filesize: 0.67 MB
  • 12 pages

Document Identifiers

Author Details

Ioannis Panageas
Georgios Piliouras

Cite AsGet BibTex

Ioannis Panageas and Georgios Piliouras. Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 67, pp. 2:1-2:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ITCS.2017.2

Abstract

Given a twice continuously differentiable cost function f, we prove that the set of initial conditions so that gradient descent converges to saddle points where \nabla^2 f has at least one strictly negative eigenvalue, has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [Lee, Simchowitz, Jordan, Recht, COLT 2016]. Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.
Keywords
  • Gradient Descent
  • Center-stable manifold
  • Saddle points
  • Hessian

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sanjeev Arora, Rong Ge, Tengyu Ma, and Ankur Moitra. Simple, efficient, and neural algorithms for sparse coding. In 28th Conference on Learning Theory (COLT), pages 113-149, 2015. Google Scholar
  2. Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via wirtinger flow: Theory and algorithms. Information Theory, IEEE Transactions on, 61(4):1985-2007, 2015. Google Scholar
  3. Erick Chastain, Adi Livnat, Christos Papadimitriou, and Umesh Vazirani. Algorithms, games, and evolution. Proceedings of the National Academy of Sciences (PNAS), 111(29):10620-10623, 2014. Google Scholar
  4. Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. arXiv preprint arXiv:1412.0233, 2014. Google Scholar
  5. Andrew R Conn, Nicholas IM Gould, and Ph L Toint. Trust region methods, volume 1. Siam, 2000. Google Scholar
  6. Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Advances in neural information processing systems (NIPS), pages 2933-2941, 2014. Google Scholar
  7. Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. Escaping from saddle points - online stochastic gradient for tensor decomposition. arXiv preprint arXiv:1503.02101, 2015. Google Scholar
  8. John L. Kelley. General Topology. Springer, 1955. Google Scholar
  9. Raghunandan H Keshavan, Sewoong Oh, and Andrea Montanari. Matrix completion from a few entries. IEEE International Symposium on Information Theory (ISIT), pages 324-328, 2009. Google Scholar
  10. Robert Kleinberg, Georgios Piliouras, and Eva Tardos. Multiplicative updates outperform generic no-regret learning in congestion games. Symposium on Theory of Computing (STOC), pages 533-542, 2009. Google Scholar
  11. Daniel D Lee and H Sebastian Seung. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems (NIPS), pages 556-562, 2001. Google Scholar
  12. Jason D. Lee, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. Gradient descent only converges to minimizers. Conference on Learning Theory (COLT), 2016. Google Scholar
  13. Ruta Mehta, Ioannis Panageas, and Georgios Piliouras. Natural selection as an inhibitor of genetic diversity: Multiplicative weights updates algorithm and a conjecture of haploid genetics. Innovations in Theoretical Computer Science (ITCS), 2015. Google Scholar
  14. Ruta Mehta, Ioannis Panageas, Georgios Piliouras, Prasad Tetali, and Vijay V. Vazirani. Mutation, Sexual Reproduction and Survival in Dynamic Environments. Innovations in Theoretical Computer Science (ITCS), 2017. Google Scholar
  15. Ruta Mehta, Ioannis Panageas, Georgios Piliouras, and Sadra Yazdanbod. The Computational Complexity of Genetic Diversity. European Symposia on Algorithms (ESA), 2016. Google Scholar
  16. Reshef Meir and David Parkes. On sex, evolution, and the multiplicative weights update algorithm. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pages 929-937. International Foundation for Autonomous Agents and Multiagent Systems, 2015. Google Scholar
  17. Jorge J Moré and Danny C Sorensen. On the use of directions of negative curvature in a modified newton method. Mathematical Programming, 16(1):1-20, 1979. Google Scholar
  18. Yurii Nesterov. Introductory lectures on convex optimization, volume 87. Springer Science and Business Media, 2004. Google Scholar
  19. Yurii Nesterov and Boris T Polyak. Cubic regularization of newton method and its global performance. Mathematical Programming, 108(1):177-205, 2006. Google Scholar
  20. Ioannis Panageas and Georgios Piliouras. Average case performance of replicator dynamics in potential games via computing regions of attraction. 17th ACM Conference on Economics and Computation (EC), 2016. Google Scholar
  21. Robin Pemantle. Nonconvergence to unstable points in urn models and stochastic approximations. The Annals of Probability, pages 698-712, 1990. Google Scholar
  22. Lawrence Perko. Differential Equations and Dynamical Systems. Springer, 3nd. edition, 1991. Google Scholar
  23. A Ravindran, Gintaras Victor Reklaitis, and Kenneth Martin Ragsdell. Engineering optimization: methods and applications. John Wiley &Sons, 2006. Google Scholar
  24. Levent Sagun, Leon Bottou, and Yann LeCun. Singularity of the hessian in deep learning. arXiv preprint arXiv:1611.07476, 2016. Google Scholar
  25. William H Sandholm. Evolutionary game theory. In Encyclopedia of Complexity and Systems Science, pages 3176-3205. Springer, 2009. Google Scholar
  26. Michael Shub. Global Stability of Dynamical Systems. Springer-Verlag, 1987. Google Scholar
  27. Michael Spivak. Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calculus. Addison-Wesley, 1965. Google Scholar
  28. Ju Sun, Qing Qu, and John Wright. Complete dictionary recovery over the sphere ii: Recovery by riemannian trust-region method. arXiv preprint arXiv:1511.04777, 2015. Google Scholar
  29. Yuchen Zhang, Xi Chen, Denny Zhou, and Michael I Jordan. Spectral methods meet em: A provably optimal algorithm for crowdsourcing. Advances in Neural Information Processing Systems (NIPS), pages 1260-1268, 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail