Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Zixiang Chen; Dongruo Zhou; Quanquan Gu

2022 ALT ALT 2022

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Abstract

Escaping from saddle points and finding local minimum is a central problem in nonconvex optimization. Perturbed gradient methods are perhaps the simplest approach for this problem. However, to find $(\epsilon, \sqrt{\epsilon})$-approximate local minima, the existing best stochastic gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$, which is not optimal. In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima. We show that LENA with stochastic gradient estimators such as SARAH/SPIDER and STORM can find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or $\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea of our framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — perturbed gradient method

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Zixiang Chen , Dongruo Zhou , Quanquan Gu

Topics

Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Continuous Optimization

Keywords

stochastic gradient nonconvex optimization convergence rate saddle point local minima perturbed gradient method

Download PDF

Related papers

Efficient and Optimal Fixed-Time Regret with Two Experts 2022

The Mirror Langevin Algorithm Converges with Vanishing Bias 2022

Infinitely Divisible Noise in the Low Privacy Regime 2022

Metric Entropy Duality and the Sample Complexity of Outcome Indistinguishability 2022

Universally Consistent Online Learning with Arbitrarily Dependent Responses 2022