A Markov Chain Theory Approach to Characterizing the Minimax Optimality  of Stochastic Gradient Descent  (for Least Squares)

Jain, Prateek; Kakade, Sham M.; Kidambi, Rahul; Netrapalli, Praneeth; Pillutla, Venkata Krishna; Sidford, Aaron

doi:10.4230/LIPIcs.FSTTCS.2017.2

Abstract

This work provides a simplified proof of the statistical minimax
optimality of (iterate averaged) stochastic gradient descent (SGD), for
the special case of least squares. This result is obtained by
analyzing SGD as a stochastic process and by sharply characterizing
the stationary covariance matrix of this process.  The finite rate optimality characterization captures the
constant factors and addresses model mis-specification.

Cite As Get BibTex

Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, and Aaron Sidford. A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares). In 37th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 93, pp. 2:1-2:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.FSTTCS.2017.2

Author Details

Prateek Jain

Sham M. Kakade

Rahul Kidambi

Praneeth Netrapalli

Venkata Krishna Pillutla

Aaron Sidford

References

Francis R. Bach. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression. Journal of Machine Learning Research (JMLR), volume 15, 2014.
Alexandre Défossez and Francis R. Bach. Averaged least-mean-squares: Bias-variance trade-offs and optimal sampling distributions. In AISTATS, volume 38, 2015.
Aymeric Dieuleveut and Francis R. Bach. Non-parametric stochastic approximation with large step sizes. The Annals of Statistics, 2015.
Roy Frostig, Rong Ge, Sham M. Kakade, and Aaron Sidford. Competing with the empirical risk minimizer in a single pass. In COLT, 2015.
Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, and Aaron Sidford. Parallelizing stochastic approximation through mini-batching and tail-averaging. CoRR, abs/1610.03774, 2016.
Harold J. Kushner and Dean S. Clark. Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer-Verlag, 1978.
Erich L. Lehmann and George Casella. Theory of Point Estimation. Springer Texts in Statistics. Springer, 1998.
Boris T. Polyak and Anatoli B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, volume 30, 1992.
David Ruppert. Efficient estimations from a slowly convergent robbins-monro process. Tech. Report, ORIE, Cornell University, 1988.
Aad W. van der Vaart. Asymptotic Statistics. Cambridge University Publishers, 2000.

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

Authors Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message