On the Influence of Momentum Acceleration on Online Learning

Kun Yuan; Bicheng Ying; Ali H. Sayed

2016 JMLR JMLR 2016

On the Influence of Momentum Acceleration on Online Learning

Abstract

The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known benefits of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learning in the presence of persistent gradient noise. From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for non-differentiable and non-convex problems. [abs] [ pdf ][ bib ] © JMLR 2016. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — momentum method

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

🐣 Hot Topic Early Bird — convergence analysis

Authors

Kun Yuan , Bicheng Ying , Ali H. Sayed

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Learning Types > Online Learning

Keywords

stochastic gradient descent online learning convergence analysis mean-square error optimization theory stochastic gradient method learning rate momentum method convex function

Download PDF

Related papers

Trend Filtering on Graphs 2016

Causal Inference through a Witness Protection Program 2016

A Characterization of Linkage-Based Hierarchical Clustering 2016

How to Center Deep Boltzmann Machines 2016

Minimax Rates in Permutation Estimation for Feature Matching 2016