Generalization Performance of Multi-pass Stochastic Gradient Descent with Convex Loss Functions

Yunwen Lei; Ting Hu; Ke Tang

2021 JMLR JMLR 2021

Generalization Performance of Multi-pass Stochastic Gradient Descent with Convex Loss Functions

Abstract

Stochastic gradient descent (SGD) has become the method of choice to tackle large-scale datasets due to its low computational cost and good practical performance. Learning rate analysis, either capacity-independent or capacity-dependent, provides a unifying viewpoint to study the computational and statistical properties of SGD, as well as the implicit regularization by tuning the number of passes. Existing capacity-independent learning rates require a nontrivial bounded subgradient assumption and a smoothness assumption to be optimal. Furthermore, existing capacity-dependent learning rates are only established for the specific least squares loss with a special structure. In this paper, we provide both optimal capacity-independent and capacity-dependent learning rates for SGD with general convex loss functions. Our results require neither bounded subgradient assumptions nor smoothness assumptions, and are stated with high probability. We achieve this improvement by a refined estimate on the norm of SGD iterates based on a careful martingale analysis and concentration inequalities on empirical processes. [abs] [ pdf ][ bib ] © JMLR 2021. (edit, beta)

🧭 Keyword Pioneer — martingale analysis

🐣 Hot Topic Early Bird — stochastic gradient descent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

Authors

Yunwen Lei , Ting Hu , Ke Tang

Topics

Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Learning Types > Deep Learning Deep Learning > Optimization & Theory > Neural Network Optimization

Keywords

stochastic gradient descent convex loss learning rate implicit regularization generalization performance martingale analysis

Download PDF

Related papers

Optimal Feedback Law Recovery by Gradient-Augmented Sparse Polynomial Regression 2021

Normalizing Flows for Probabilistic Modeling and Inference 2021

Determining the Number of Communities in Degree-corrected Stochastic Block Models 2021

Guided Visual Exploration of Relations in Data Sets 2021

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach 2021