2018
NIPS
NeurIPS 2018
How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective
Abstract
The question of which global minima are accessible by a stochastic gradient decent (SGD) algorithm with specific learning rate and batch size is studied from the perspective of dynamical stability. The concept of non-uniformity is introduced, which, together with sharpness, characterizes the stability property of a global minimum and hence the accessibility of a particular SGD algorithm to that global minimum. In particular, this analysis shows that learning rate and batch size play different roles in minima selection. Extensive empirical results seem to correlate well with the theoretical findings and provide further support to these claims.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning
🧭
Keyword Pioneer
— over-parameterized learning
🐣
Hot Topic Early Bird
— learning rate
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Topics
Machine Learning > Optimization & Theory > Learning Theory
Machine Learning > Optimization & Theory > Neural Network Optimization
Machine Learning > Optimization & Theory > Optimization
Machine Learning > Optimization & Theory > Theory
Deep Learning > Optimization & Theory > Neural Network Optimization
Deep Learning > Optimization & Theory > Theory