A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets

Nicolas L. Roux; Mark Schmidt; Francis R. Bach

2012 NIPS NeurIPS 2012

A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets

Abstract

We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training error and reducing the test error quickly.

📈 Trend Setter — Neural Network Optimization

🧭 Keyword Pioneer — stochastic gradient method

🐣 Hot Topic Early Bird — convergence rate

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

Authors

Nicolas L. Roux , Mark Schmidt , Francis R. Bach

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Core Methods > Optimization Mathematics & Optimization > Optimization > Convex Optimization

Keywords

stochastic gradient stochastic gradient descent convex optimization strongly convex stochastic gradient method exponential convergence rate strongly convex optimization finite training sets linear convergence convergence rate strongly convex function exponential convergence machine learning optimization linear convergence rate

Download PDF

Related papers

Kernel Hyperalignment 2012

Fused sparsity and robust estimation for linear models with unknown variance 2012

Slice sampling normalized kernel-weighted completely random measure mixture models 2012

Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization 2012

Matrix reconstruction with the local max norm 2012