Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Panageas, Ioannis; Piliouras, Georgios

doi:10.4230/LIPIcs.ITCS.2017.2

File

Cite AsGet BibTex

Ioannis Panageas and Georgios Piliouras. Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 67, pp. 2:1-2:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ITCS.2017.2

Abstract

Given a twice continuously differentiable cost function f, we prove that the set of initial conditions so that gradient descent converges to saddle points where \nabla^2 f has at least one strictly negative eigenvalue, has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [Lee, Simchowitz, Jordan, Recht, COLT 2016]. Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Sanjeev Arora, Rong Ge, Tengyu Ma, and Ankur Moitra. Simple, efficient, and neural algorithms for sparse coding. In 28th Conference on Learning Theory (COLT), pages 113-149, 2015.
Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via wirtinger flow: Theory and algorithms. Information Theory, IEEE Transactions on, 61(4):1985-2007, 2015.
Erick Chastain, Adi Livnat, Christos Papadimitriou, and Umesh Vazirani. Algorithms, games, and evolution. Proceedings of the National Academy of Sciences (PNAS), 111(29):10620-10623, 2014.
Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. arXiv preprint arXiv:1412.0233, 2014.
Andrew R Conn, Nicholas IM Gould, and Ph L Toint. Trust region methods, volume 1. Siam, 2000.
Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Advances in neural information processing systems (NIPS), pages 2933-2941, 2014.
Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. Escaping from saddle points - online stochastic gradient for tensor decomposition. arXiv preprint arXiv:1503.02101, 2015.
John L. Kelley. General Topology. Springer, 1955.
Raghunandan H Keshavan, Sewoong Oh, and Andrea Montanari. Matrix completion from a few entries. IEEE International Symposium on Information Theory (ISIT), pages 324-328, 2009.
Robert Kleinberg, Georgios Piliouras, and Eva Tardos. Multiplicative updates outperform generic no-regret learning in congestion games. Symposium on Theory of Computing (STOC), pages 533-542, 2009.
Daniel D Lee and H Sebastian Seung. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems (NIPS), pages 556-562, 2001.
Jason D. Lee, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. Gradient descent only converges to minimizers. Conference on Learning Theory (COLT), 2016.
Ruta Mehta, Ioannis Panageas, and Georgios Piliouras. Natural selection as an inhibitor of genetic diversity: Multiplicative weights updates algorithm and a conjecture of haploid genetics. Innovations in Theoretical Computer Science (ITCS), 2015.
Ruta Mehta, Ioannis Panageas, Georgios Piliouras, Prasad Tetali, and Vijay V. Vazirani. Mutation, Sexual Reproduction and Survival in Dynamic Environments. Innovations in Theoretical Computer Science (ITCS), 2017.
Ruta Mehta, Ioannis Panageas, Georgios Piliouras, and Sadra Yazdanbod. The Computational Complexity of Genetic Diversity. European Symposia on Algorithms (ESA), 2016.
Reshef Meir and David Parkes. On sex, evolution, and the multiplicative weights update algorithm. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pages 929-937. International Foundation for Autonomous Agents and Multiagent Systems, 2015.
Jorge J Moré and Danny C Sorensen. On the use of directions of negative curvature in a modified newton method. Mathematical Programming, 16(1):1-20, 1979.
Yurii Nesterov. Introductory lectures on convex optimization, volume 87. Springer Science and Business Media, 2004.
Yurii Nesterov and Boris T Polyak. Cubic regularization of newton method and its global performance. Mathematical Programming, 108(1):177-205, 2006.
Ioannis Panageas and Georgios Piliouras. Average case performance of replicator dynamics in potential games via computing regions of attraction. 17th ACM Conference on Economics and Computation (EC), 2016.
Robin Pemantle. Nonconvergence to unstable points in urn models and stochastic approximations. The Annals of Probability, pages 698-712, 1990.
Lawrence Perko. Differential Equations and Dynamical Systems. Springer, 3nd. edition, 1991.
A Ravindran, Gintaras Victor Reklaitis, and Kenneth Martin Ragsdell. Engineering optimization: methods and applications. John Wiley &Sons, 2006.
Levent Sagun, Leon Bottou, and Yann LeCun. Singularity of the hessian in deep learning. arXiv preprint arXiv:1611.07476, 2016.
William H Sandholm. Evolutionary game theory. In Encyclopedia of Complexity and Systems Science, pages 3176-3205. Springer, 2009.
Michael Shub. Global Stability of Dynamical Systems. Springer-Verlag, 1987.
Michael Spivak. Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calculus. Addison-Wesley, 1965.
Ju Sun, Qing Qu, and John Wright. Complete dictionary recovery over the sphere ii: Recovery by riemannian trust-region method. arXiv preprint arXiv:1511.04777, 2015.
Yuchen Zhang, Xi Chen, Denny Zhou, and Michael I Jordan. Spectral methods meet em: A provably optimal algorithm for crowdsourcing. Advances in Neural Information Processing Systems (NIPS), pages 1260-1268, 2014.

Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Authors Ioannis Panageas, Georgios Piliouras

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References

Thanks for your feedback!

Could not send message