Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent

Authors Zeyuan Allen-Zhu, Lorenzo Orecchia

Thumbnail PDF


  • Filesize: 0.62 MB
  • 22 pages

Document Identifiers

Author Details

Zeyuan Allen-Zhu
Lorenzo Orecchia

Cite AsGet BibTex

Zeyuan Allen-Zhu and Lorenzo Orecchia. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 67, pp. 3:1-3:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


First-order methods play a central role in large-scale machine learning. Even though many variations exist, each suited to a particular problem, almost all such methods fundamentally rely on two types of algorithmic steps: gradient descent, which yields primal progress, and mirror descent, which yields dual progress. We observe that the performances of gradient and mirror descent are complementary, so that faster algorithms can be designed by "linearly coupling" the two. We show how to reconstruct Nesterov's accelerated gradient methods using linear coupling, which gives a cleaner interpretation than Nesterov's original proofs. We also discuss the power of linear coupling by extending it to many other settings that Nesterov's methods cannot apply to.
  • linear coupling
  • gradient descent
  • mirror descent
  • acceleration


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Zeyuan Allen-Zhu. Katyusha: Accelerated Variance Reduction for Faster SGD. ArXiv e-prints, abs/1603.05953, March 2016. Google Scholar
  2. Zeyuan Allen-Zhu and Elad Hazan. Variance Reduction for Faster Non-Convex Optimization. In ICML, 2016. Google Scholar
  3. Zeyuan Allen-Zhu, Yin Tat Lee, and Lorenzo Orecchia. Using optimization to obtain a width-independent, parallel, simpler, and faster positive SDP solver. In SODA, 2016. Google Scholar
  4. Zeyuan Allen-Zhu and Lorenzo Orecchia. Nearly-Linear Time Positive LP Solver with Faster Convergence Rate. In STOC, 2015. Google Scholar
  5. Zeyuan Allen-Zhu and Lorenzo Orecchia. Using optimization to break the epsilon barrier: A faster and simpler width-independent algorithm for solving positive linear programs in parallel. In SODA, 2015. Google Scholar
  6. Zeyuan Allen-Zhu, Peter Richtárik, Zheng Qu, and Yang Yuan. Even faster accelerated coordinate descent using non-uniform sampling. In ICML, 2016. Google Scholar
  7. Sanjeev Arora, Elad Hazan, and Satyen Kale. Fast Algorithms for Approximate Semidefinite Programming using the Multiplicative Weights Update Method. In FOCS, pages 339-348. IEEE, 2005. URL:
  8. Sanjeev Arora, Elad Hazan, and Satyen Kale. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications. Theory of Computing, 8:121-164, 2012. URL:
  9. Aharon Ben-Tal and Arkadi Nemirovski. Lectures on Modern Convex Optimization. Society for Industrial and Applied Mathematics, January 2013. URL:
  10. Sébastien Bubeck, Yin Tat Lee, and Mohit Singh. A geometric alternative to Nesterov’s accelerated gradient descent. ArXiv e-prints, abs/1506.08187, June 2015. URL:
  11. Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, and Lin Xiao. Optimal distributed online prediction using mini-batches. The Journal of Machine Learning Research, 13(1):165-202, 2012. URL:
  12. John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Ambuj Tewari. Composite Objective Mirror Descent. In COLT, 2010. Google Scholar
  13. Olivier Fercoq and Peter Richtárik. Accelerated, parallel, and proximal coordinate descent. SIAM Journal on Optimization, 25(4):1997-2023, 2015. First appeared on ArXiv 1312.5799 in 2013. Google Scholar
  14. Yoav Freund and Robert E Schapire. A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory, pages 23-37. Springer, 1995. Google Scholar
  15. Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169-192, August 2007. URL:
  16. Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems, NIPS 2013, pages 315-323, 2013. Google Scholar
  17. Anatoli Juditsky. Convex optimization ii: Algorithms. Lecture notes, November 2013. Google Scholar
  18. Jonathan A. Kelner, Yin Tat Lee, Lorenzo Orecchia, and Aaron Sidford. An Almost-Linear-Time Algorithm for Approximate Max Flow in Undirected Graphs, and its Multicommodity Generalizations. In SODA, April 2014. URL:
  19. Guanghui Lan. An optimal method for stochastic composite optimization. Mathematical Programming, 133(1-2):365-397, January 2011. URL:
  20. Yin Tat Lee, Satish Rao, and Nikhil Srivastava. A new approach to computing maximum flows using electrical flows. In STOC, page 755, New York, New York, USA, 2013. Google Scholar
  21. Yin Tat Lee and Aaron Sidford. Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In FOCS, pages 147-156. IEEE, 2013. Google Scholar
  22. Michael W. Mahoney, Satish Rao, Di Wang, and Peng Zhang. Approximating the solution to mixed packing and covering lps in parallel Õ(ε^-3) time. In ICALP, 2016. Google Scholar
  23. H. Brendan McMahan and Matthew Streeter. Adaptive Bound Optimization for Online Convex Optimization. In COLT, 2010. URL:
  24. Arkadi Nemirovsky and David Yudin. Problem complexity and method efficiency in optimization. Nauka Publishers, Moscow (in Russian), 1978. John Wiley, New York (in English) 1983. Google Scholar
  25. Yurii Nesterov. A method of solving a convex programming problem with convergence rate O(1/k²). In Doklady AN SSSR (translated as Soviet Mathematics Doklady), volume 269, pages 543-547, 1983. Google Scholar
  26. Yurii Nesterov. Introductory Lectures on Convex Programming Volume: A Basic course, volume I. Kluwer Academic Publishers, 2004. Google Scholar
  27. Yurii Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127-152, December 2005. URL:
  28. Yurii Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120(1):221-259, June 2007. URL:
  29. Yurii Nesterov. Accelerating the cubic regularization of newton’s method on convex problems. Mathematical Programming, 112(1):159-181, 2008. Google Scholar
  30. Yurii Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1):125-161, 2013. URL:
  31. Yurii Nesterov. Universal gradient methods for convex optimization problems. Mathematical Programming, May 2014. URL:
  32. Brendan O'Donoghue and Emmanuel Candès. Adaptive Restart for Accelerated Gradient Schemes. Foundations of Computational Mathematics, July 2013. URL:
  33. Lorenzo Orecchia, Sushant Sachdeva, and Nisheeth K. Vishnoi. Approximating the exponential, the lanczos method and an Õ(m)-time spectral algorithm for balanced separator. In STOC '12. ACM Press, November 2012. Google Scholar
  34. Serge A. Plotkin, David B. Shmoys, and Éva Tardos. Fast Approximation Algorithms for Fractional Packing and Covering Problems. Mathematics of Operations Research, 20(2):257-301, May 1995. URL:
  35. Ankan Saha, S. V. N. Vishwanathan, and Xinhua Zhang. New Approximation Algorithms for Minimum Enclosing Convex Shapes. In SODA, pages 1146-1160, September 2011. URL:
  36. Shai Shalev-Shwartz. Online Learning and Online Convex Optimization. Foundations and Trends in Machine Learning, 4(2):107-194, 2012. URL:
  37. Shai Shalev-Shwartz and Yoram Singer. Logarithmic regret algorithms for strongly convex repeated games. Technical report, The Hebrew University, 2007. Google Scholar
  38. Shai Shalev-Shwartz and Tong Zhang. Accelerated Mini-Batch Stochastic Dual Coordinate Ascent. In NIPS, pages 1-17, May 2013. URL:
  39. Shai Shalev-Shwartz and Tong Zhang. Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization. In ICML, pages 64-72, 2014. Google Scholar
  40. Ohad Shamir and Tong Zhang. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes. In Proceedings of the 30th International Conference on Machine Learning - ICML '13, volume 28, 2013. URL:
  41. Weijie Su, Stephen Boyd, and Emmanuel Candes. A differential equation for modeling nesterov’s accelerated gradient method: Theory and insights. In Advances in Neural Information Processing Systems, pages 2510-2518, 2014. Google Scholar
  42. Di Wang, Michael W. Mahoney, Nishanth Mohan, and Satish Rao. Faster parallel solver for positive linear programs via dynamically-bucketed selective coordinate descent. ArXiv e-prints, abs/1511.06468, November 2015. Google Scholar
  43. Di Wang, Satish Rao, and Michael W. Mahoney. Unified acceleration method for packing and covering problems via diameter reduction. In ICALP, 2016. Google Scholar
  44. Lin Xiao. Dual averaging method for regularized stochastic learning and online optimization. The Journal of Machine Learning Research, 11:2543-2596, 2010. Google Scholar