A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

Authors Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2017.2.pdf
  • Filesize: 383 kB
  • 10 pages

Document Identifiers

Author Details

Prateek Jain
Sham M. Kakade
Rahul Kidambi
Praneeth Netrapalli
Venkata Krishna Pillutla
Aaron Sidford

Cite AsGet BibTex

Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, and Aaron Sidford. A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares). In 37th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 93, pp. 2:1-2:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.FSTTCS.2017.2

Abstract

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.
Keywords
  • Stochastic Gradient Descent
  • Minimax Optimality
  • Least Squares Regression

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Francis R. Bach. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression. Journal of Machine Learning Research (JMLR), volume 15, 2014. Google Scholar
  2. Alexandre Défossez and Francis R. Bach. Averaged least-mean-squares: Bias-variance trade-offs and optimal sampling distributions. In AISTATS, volume 38, 2015. Google Scholar
  3. Aymeric Dieuleveut and Francis R. Bach. Non-parametric stochastic approximation with large step sizes. The Annals of Statistics, 2015. Google Scholar
  4. Roy Frostig, Rong Ge, Sham M. Kakade, and Aaron Sidford. Competing with the empirical risk minimizer in a single pass. In COLT, 2015. Google Scholar
  5. Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, and Aaron Sidford. Parallelizing stochastic approximation through mini-batching and tail-averaging. CoRR, abs/1610.03774, 2016. Google Scholar
  6. Harold J. Kushner and Dean S. Clark. Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer-Verlag, 1978. Google Scholar
  7. Erich L. Lehmann and George Casella. Theory of Point Estimation. Springer Texts in Statistics. Springer, 1998. Google Scholar
  8. Boris T. Polyak and Anatoli B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, volume 30, 1992. Google Scholar
  9. David Ruppert. Efficient estimations from a slowly convergent robbins-monro process. Tech. Report, ORIE, Cornell University, 1988. Google Scholar
  10. Aad W. van der Vaart. Asymptotic Statistics. Cambridge University Publishers, 2000. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail