On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Panayotis Mertikopoulos; Nadav Hallak; Ali Kavis; Volkan Cevher

2020 NIPS NeurIPS 2020

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Abstract

In this paper, we analyze the trajectories of stochastic gradient descent (SGD) with the aim of understanding their convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, we prove that the algorithm's rate of convergence to local minimizers with a positive-definite Hessian is $O(1/n^p)$ if the method is run with a $Θ(1/n^p)$ step-size. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to significant performance gains; we demonstrate this heuristic using ResNet architectures on CIFAR. Finally, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered.

🧭 Keyword Pioneer — step-size schedule

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Panayotis Mertikopoulos , Nadav Hallak , Ali Kavis , Volkan Cevher

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Stochastic Processes

Keywords

stochastic gradient descent non-convex optimization saddle point vanishing gradient step-size schedule

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020