A Markov Chain Theory Approach to Characterizing the Minimax Optimality  of Stochastic Gradient Descent  (for Least Squares)

Jain, Prateek; Kakade, Sham M.; Kidambi, Rahul; Netrapalli, Praneeth; Pillutla, Venkata Krishna; Sidford, Aaron

doi:10.4230/LIPIcs.FSTTCS.2017.2

File

LIPIcs.FSTTCS.2017.2.pdf

Filesize: 383 kB
10 pages

Document Identifiers

DOI: 10.4230/LIPIcs.FSTTCS.2017.2
URN: urn:nbn:de:0030-drops-83941

Author Details

Prateek Jain

Sham M. Kakade

Rahul Kidambi

Praneeth Netrapalli

Venkata Krishna Pillutla

Aaron Sidford

Cite AsGet BibTex

Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, and Aaron Sidford. A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares). In 37th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 93, pp. 2:1-2:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.FSTTCS.2017.2

Abstract

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.

Keywords

Stochastic Gradient Descent
Minimax Optimality
Least Squares Regression

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Francis R. Bach. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression. Journal of Machine Learning Research (JMLR), volume 15, 2014.
Alexandre Défossez and Francis R. Bach. Averaged least-mean-squares: Bias-variance trade-offs and optimal sampling distributions. In AISTATS, volume 38, 2015.
Aymeric Dieuleveut and Francis R. Bach. Non-parametric stochastic approximation with large step sizes. The Annals of Statistics, 2015.
Roy Frostig, Rong Ge, Sham M. Kakade, and Aaron Sidford. Competing with the empirical risk minimizer in a single pass. In COLT, 2015.
Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, and Aaron Sidford. Parallelizing stochastic approximation through mini-batching and tail-averaging. CoRR, abs/1610.03774, 2016.
Harold J. Kushner and Dean S. Clark. Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer-Verlag, 1978.
Erich L. Lehmann and George Casella. Theory of Point Estimation. Springer Texts in Statistics. Springer, 1998.
Boris T. Polyak and Anatoli B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, volume 30, 1992.
David Ruppert. Efficient estimations from a slowly convergent robbins-monro process. Tech. Report, ORIE, Cornell University, 1988.
Aad W. van der Vaart. Asymptotic Statistics. Cambridge University Publishers, 2000.

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

Authors Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford