2016
NIPS
NeurIPS 2016
Optimal Learning for Multi-pass Stochastic Gradient Methods
Abstract
We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by early-stopping. Moreover, we show that larger step-sizes are allowed when considering mini-batches. Our analysis is based on a unifying approach, encompassing both batch and stochastic gradient methods as special cases.
🌉
Interdisciplinary Bridge
— Machine Learning and Mathematics & Optimization
🐣
Hot Topic Early Bird
— learning rate
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Optimization & Theory > Neural Network Optimization
Machine Learning > Optimization & Theory > Optimization
Mathematics & Optimization > Optimization > Stochastic Methods
Machine Learning > Optimization & Theory > Stochastic Methods
Machine Learning > Learning Types > Optimization