Convergence Results for Neural Networks via Electrodynamics

Authors Rina Panigrahy, Ali Rahimi, Sushant Sachdeva, Qiuyi Zhang



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2018.22.pdf
  • Filesize: 0.68 MB
  • 19 pages

Document Identifiers

Author Details

Rina Panigrahy
Ali Rahimi
Sushant Sachdeva
Qiuyi Zhang

Cite As Get BibTex

Rina Panigrahy, Ali Rahimi, Sushant Sachdeva, and Qiuyi Zhang. Convergence Results for Neural Networks via Electrodynamics. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 22:1-22:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.ITCS.2018.22

Abstract

We study whether a depth two neural network can learn another 
depth two network using gradient descent. Assuming a linear output node, we show that the question of whether gradient descent converges to the target function is equivalent to the following question in electrodynamics: Given k fixed protons in R^d, and k electrons, each moving due to the attractive force from the protons and repulsive force from the remaining electrons, whether at equilibrium all the electrons will be matched up with the protons, up to a permutation. Under the standard electrical force, this follows from the classic Earnshaw's theorem. In our setting,
the force is determined by the activation function and the
input distribution. Building on this equivalence, we prove the
existence of an activation function such that gradient descent learns at least one of the hidden nodes in the target network. 
Iterating, we show that gradient descent can be used to learn the entire network one node at a time.

Subject Classification

Keywords
  • Deep Learning
  • Learning Theory
  • Non-convex Optimization

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In International Conference on Machine Learning, pages 1908-1916, 2014. Google Scholar
  2. Vladimir I Arnold, Valery V Kozlov, and Anatoly I Neishtadt. Mathematical aspects of classical and celestial mechanics. Encyclopaedia Math. Sci, 3:1-291, 1985. Google Scholar
  3. Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. Provable bounds for learning some deep representations. In ICML, pages 584-592, 2014. Google Scholar
  4. Peter Auer, Mark Herbster, and Manfred K. Warmuth. Exponentially many local minima for single neurons. In David S. Touretzky, Michael Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, NIPS, Denver, CO, November 27-30, 1995, pages 316-322. MIT Press, 1995. URL: http://papers.nips.cc/paper/1028-exponentially-many-local-minima-for-single-neurons.
  5. Avrim Blum and Ronald L. Rivest. Training a 3-node neural network is np-complete. In David Haussler and Leonard Pitt, editors, Proceedings of the First Annual Workshop on Computational Learning Theory, COLT '88, Cambridge, MA, USA, August 3-5, 1988., pages 9-18. ACM/MIT, 1988. URL: http://dl.acm.org/citation.cfm?id=93033.
  6. Martin L Brady, Raghu Raghavan, and Joseph Slawny. Back propagation fails to separate where perceptrons succeed. IEEE Transactions on Circuits and Systems, 36(5):665-674, 1989. Google Scholar
  7. Alon Brutzkus and Amir Globerson. Globally optimal gradient descent for a convnet with gaussian inputs. arXiv preprint arXiv:1702.07966, 2017. Google Scholar
  8. Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. In AISTATS, 2015. Google Scholar
  9. Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). CoRR, abs/1511.07289, 2015. URL: http://arxiv.org/abs/1511.07289.
  10. Amit Daniely. Sgd learns the conjugate kernel class of the network. arXiv preprint arXiv:1702.08503, 2017. Google Scholar
  11. Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in neural information processing systems, pages 2933-2941, 2014. Google Scholar
  12. Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. Escaping from saddle points-online stochastic gradient for tensor decomposition. In COLT, pages 797-842, 2015. Google Scholar
  13. Rong Ge, Jason D Lee, and Tengyu Ma. Learning one-hidden-layer neural networks with landscape design. arXiv preprint arXiv:1711.00501, 2017. Google Scholar
  14. Surbhi Goel, Varun Kanade, Adam Klivans, and Justin Thaler. Reliably learning the relu in polynomial time. arXiv preprint arXiv:1611.10258, 2016. Google Scholar
  15. Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. Generalization bounds for neural networks through tensor factorization. CoRR, abs/1506.08473, 2015. URL: http://arxiv.org/abs/1506.08473.
  16. Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M Kakade, and Michael I Jordan. How to escape saddle points efficiently. arXiv preprint arXiv:1703.00887, 2017. Google Scholar
  17. Kenji Kawaguchi. Deep learning without poor local minima. In Advances in Neural Information Processing Systems, pages 586-594, 2016. Google Scholar
  18. Adam R Klivans and Alexander A Sherstov. Cryptographic hardness for learning intersections of halfspaces. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), pages 553-562. IEEE, 2006. Google Scholar
  19. Yuanzhi Li and Yang Yuan. Convergence analysis of two-layer neural networks with relu activation. arXiv preprint arXiv:1705.09886, 2017. Google Scholar
  20. Roi Livni, Shai Shalev-Shwartz, and Ohad Shamir. On the computational efficiency of training neural networks. In Advances in Neural Information Processing Systems, pages 855-863, 2014. Google Scholar
  21. Anish Shah, Eashan Kadam, Hena Shah, Sameer Shinde, and Sandip Shingade. Deep residual networks with exponential linear unit. In Proceedings of the Third International Symposium on Computer Vision and the Internet, pages 59-65. ACM, 2016. Google Scholar
  22. Shai Shalev-Shwartz, Ohad Shamir, and Karthik Sridharan. Learning kernel-based halfspaces with the 0-1 loss. SIAM Journal on Computing, 40(6):1623-1646, 2011. Google Scholar
  23. Daniel Soudry and Yair Carmon. No bad local minima: Data independent training error guarantees for multilayer neural networks. CoRR, abs/1605.08361, 2016. URL: http://arxiv.org/abs/1605.08361.
  24. Yuandong Tian. An analytical formula of population gradient for two-layered relu network and its applications in convergence and critical point analysis. arXiv preprint arXiv:1703.00560, 2017. Google Scholar
  25. Qiuyi Zhang, Rina Panigrahy, and Sushant Sachdeva. Electron-proton dynamics in deep learning. CoRR, abs/1702.00458, 2017. URL: http://arxiv.org/abs/1702.00458.
  26. Yuchen Zhang, Jason D Lee, and Michael I Jordan. l1-regularized neural networks are improperly learnable in polynomial time. In International Conference on Machine Learning, pages 993-1001, 2016. Google Scholar
  27. Yuchen Zhang, Jason D. Lee, Martin J. Wainwright, and Michael I. Jordan. Learning halfspaces and neural networks with random initialization. CoRR, abs/1511.07948, 2015. URL: http://arxiv.org/abs/1511.07948.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail