Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method

Authors Jelena Diakonikolas, Lorenzo Orecchia



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2018.23.pdf
  • Filesize: 1.12 MB
  • 19 pages

Document Identifiers

Author Details

Jelena Diakonikolas
Lorenzo Orecchia

Cite AsGet BibTex

Jelena Diakonikolas and Lorenzo Orecchia. Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 23:1-23:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ITCS.2018.23

Abstract

We provide a novel accelerated first-order method that achieves the asymptotically optimal convergence rate for smooth functions in the first-order oracle model. To this day, Nesterov's Accelerated Gradient Descent (AGD) and variations thereof were the only methods achieving acceleration in this standard blackbox model. In contrast, our algorithm is significantly different from AGD, as it relies on a predictor-corrector approach similar to that used by Mirror-Prox [Nemirovski, 2004] and Extra-Gradient Descent [Korpelevich, 1977] in the solution of convex-concave saddle point problems. For this reason, we dub our algorithm Accelerated Extra-Gradient Descent (AXGD). Its construction is motivated by the discretization of an accelerated continuous-time dynamics [Krichene et al., 2015] using the classical method of implicit Euler discretization. Our analysis explicitly shows the effects of discretization through a conceptually novel primal-dual viewpoint. Moreover, we show that the method is quite general: it attains optimal convergence rates for other classes of objectives (e.g., those with generalized smoothness properties or that are non-smooth and Lipschitz-continuous) using the appropriate choices of step lengths. Finally, we present experiments showing that our algorithm matches the performance of Nesterov's method, while appearing more robust to noise in some cases.
Keywords
  • Acceleration
  • dynamical systems
  • discretization
  • first-order methods

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Zeyuan Allen-Zhu and Lorenzo Orecchia. Linear coupling: An ultimate unification of gradient and mirror descent. In Proc. ITCS'17, 2017. Google Scholar
  2. Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proc. IEEE FOCS'14, 2014. Google Scholar
  3. Aharon Ben-Tal and Arkadi Nemirovski. Lectures on modern convex optimization: Analysis, algorithms, and engineering applications. MPS-SIAM Series on Optimization. SIAM, 2001. Google Scholar
  4. Sébastien Bubeck. Theory of convex optimization for machine learning. CoRR, abs/1405.4980, 2014. URL: http://arxiv.org/abs/1405.4980v1.
  5. Sébastien Bubeck, Yin Tat Lee, and Mohit Singh. A geometric alternative to nesterov’s accelerated gradient descent. CoRR, abs/1506.08187, 2015. URL: http://arxiv.org/abs/1506.08187.
  6. Olivier Devolder, François Glineur, and Yurii Nesterov. First-order methods of smooth convex optimization with inexact oracle. Math. Prog., 146(1-2):37-75, 2014. Google Scholar
  7. Jelena Diakonikolas and Lorenzo Orecchia. The approximate gap technique: A unified approach to optimal first-order methods, 2017. Manuscript. Google Scholar
  8. A. Ene and H. L. Nguyen. Constrained submodular maximization: Beyond 1/e. In Proc. IEEE FOCS'16, 2016. Google Scholar
  9. E Hairer, SP Nørsett, and G Wanner. Solving Ordinary Differential Equations I (2nd Revised. Ed.): Nonstiff Problems. Springer Ser. Comput. Math. Springer-Verlag New York, Inc., 1993. Google Scholar
  10. Moritz Hardt. Robustness vs acceleration, 2014. URL: http://blog.mrtz.org/2014/08/18/robustness-versus-acceleration.html.
  11. Rahul Jain, Zhengfeng Ji, Sarvagya Upadhyay, and John Watrous. QIP = PSPACE. Journal of the ACM (JACM), 58(6):30, 2011. Google Scholar
  12. Jonathan A. Kelner, Yin Tat Lee, Lorenzo Orecchia, and Aaron Sidford. An almost-linear-time algorithm for approximate max flow in undirected graphs, and its multicommodity generalizations. In Proc. ACM-SIAM SODA'14, 2014. Google Scholar
  13. Jonathan A. Kelner, Lorenzo Orecchia, Aaron Sidford, and Zeyuan Allen Zhu. A simple, combinatorial algorithm for solving SDD systems in nearly-Linear time. In Proc. ACM STOC'13, 2013. Google Scholar
  14. G. M. Korpelevich. The extragradient method for finding saddle points and other problems. Matekon : translations of Russian &East European mathematical economics, 13(4):35-49, 1977. Google Scholar
  15. Walid Krichene, Alexandre Bayen, and Peter L Bartlett. Accelerated mirror descent in continuous and discrete time. In Proc. NIPS'15, 2015. Google Scholar
  16. Guanghui Lan. An optimal method for stochastic composite optimization. Math. Prog., 133(1-2):365-397, January 2011. Google Scholar
  17. Yin Tat Lee, Satish Rao, and Nikhil Srivastava. A new approach to computing maximum flows using electrical flows. In Proc. ACM STOC '13, 2013. Google Scholar
  18. Arkadi Nemirovski. Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optimiz., 15(1):229-251, 2004. Google Scholar
  19. Arkadi S Nemirovski and Yurii Evgen'evich Nesterov. Optimal methods of smooth convex minimization. Zh. Vychisl. Mat. i Mat. Fiz., 25(3):356-369, 1985. Google Scholar
  20. Arkadii Nemirovskii and David Borisovich Yudin. Problem complexity and method efficiency in optimization. Wiley, 1983. Google Scholar
  21. Yu Nesterov. Universal gradient methods for convex optimization problems. Math. Prog., 152(1-2):381-404, 2015. Google Scholar
  22. Yurii Nesterov. A method of solving a convex programming problem with convergence rate O(1/k²). In Doklady AN SSSR (translated as Soviet Mathematics Doklady), volume 269, pages 543-547, 1983. Google Scholar
  23. Yurii Nesterov. Introductory Lectures on Convex Programming Volume: A Basic course, volume I. Kluwer Academic Publishers, 2004. Google Scholar
  24. Yurii Nesterov. Excessive gap technique in nonsmooth convex minimization. SIAM J. Optimiz., 16(1):235-249, January 2005. Google Scholar
  25. Yurii Nesterov. Smooth minimization of non-smooth functions. Math. Prog., 103(1):127-152, December 2005. Google Scholar
  26. Yurii Nesterov. Accelerating the cubic regularization of Newton’s method on convex problems. Math. Prog., 112(1):159-181, 2008. Google Scholar
  27. Yurii Nesterov. Gradient methods for minimizing composite functions. Math. Prog., 140(1):125-161, 2013. Google Scholar
  28. Yurii Nesterov. Introductory lectures on convex optimization: A basic course. Springer Science &Business Media, 2013. Google Scholar
  29. Jonah Sherman. Nearly maximum flows in nearly linear time. In Proc. IEEE FOCS'13, 2013. Google Scholar
  30. Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. ACM STOC '04, 2004. Google Scholar
  31. Weijie Su, Stephen Boyd, and Emmanuel Candes. A differential equation for modeling nesterov’s accelerated gradient method: Theory and insights. In Proc. NIPS'14, 2014. Google Scholar
  32. Paul Tseng. On accelerated proximal gradient methods for convex-concave optimization, 2008. Google Scholar
  33. Andre Wibisono, Ashia C Wilson, and Michael I Jordan. A variational perspective on accelerated methods in optimization. In Proc. Natl. Acad. Sci. U.S.A., 2016. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail