From Averaging to Acceleration, There is Only a Step-size

Nicolas Flammarion; Francis Bach

2015 COLT COLT 2015

From Averaging to Acceleration, There is Only a Step-size

Abstract

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for quadratic non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n^2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system, showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration.

🧭 Keyword Pioneer — heavy-ball method

🐣 Hot Topic Early Bird — gradient descent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nicolas Flammarion , Francis Bach

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization

Keywords

stability analysis gradient descent accelerated gradient quadratic optimization noisy gradient heavy-ball method

Download PDF

Related papers

Open Problem: Restricted Eigenvalue Condition for Heavy Tailed Designs 2015

Open Problem: The Oracle Complexity of Smooth Convex Optimization in Nonstandard Settings 2015

Online Learning with Feedback Graphs: Beyond Bandits 2015

Learning Overcomplete Latent Variable Models through Tensor Methods 2015

Efficient Learning of Linear Separators under Bounded Noise 2015