Local Smoothness in Variance Reduced Optimization

Daniel Vainsencher; Han Liu; Tong Zhang

2015 NIPS NeurIPS 2015

Local Smoothness in Variance Reduced Optimization

Abstract

Abstract We propose a family of non-uniform sampling strategies to provably speed up a class of stochastic optimization algorithms with linear convergence including Stochastic Variance Reduced Gradient (SVRG) and Stochastic Dual Coordinate Ascent (SDCA). For a large family of penalized empirical risk minimization problems, our methods exploit data dependent local smoothness of the loss functions near the optimum, while maintaining convergence guarantees. Our bounds are the first to quantify the advantage gained from local smoothness which are significant for some problems significantly better. Empirically, we provide thorough numerical results to back up our theory. Additionally we present algorithms exploiting local smoothness in more aggressive ways, which perform even better in practice.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐣 Hot Topic Early Bird — empirical risk minimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Daniel Vainsencher , Han Liu , Tong Zhang

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods

Keywords

stochastic optimization empirical risk minimization variance reduction stochastic dual coordinate ascent local smoothness stochastic variance reduced gradient

Download PDF

Related papers

Data Generation as Sequential Decision Making 2015

A Recurrent Latent Variable Model for Sequential Data 2015

Combinatorial Cascading Bandits 2015

Accelerated Mirror Descent in Continuous and Discrete Time 2015

Matrix Completion with Noisy Side Information 2015