Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields

Mark Schmidt; Reza Babanezhad; Mohamed Ahmed; Aaron Defazio; Ann Clifton; Anoop Sarkar

2015 AISTATS AISTATS 2015

Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields

Abstract

We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs). We describe a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearly-convergent stochastic gradient method, propose a non-uniform sampling scheme that substantially improves practical performance, and analyze the rate of convergence of the SAGA variant under non-uniform sampling. Our experimental results reveal that our method significantly outperforms existing methods in terms of the training objective, and performs as well or better than optimally-tuned stochastic gradient methods in terms of test error.

🧭 Keyword Pioneer — saga algorithm

🐣 Hot Topic Early Bird — stochastic gradient

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mark Schmidt , Reza Babanezhad , Mohamed Ahmed , Aaron Defazio , Ann Clifton , Anoop Sarkar

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Stochastic Processes

Keywords

stochastic gradient non-uniform sampling convergence rate conditional random field saga algorithm

Download PDF

Related papers

Near-optimal max-affine estimators for convex regression 2015

Sparse Solutions to Nonnegative Linear Systems and Applications 2015

Online Optimization : Competing with Dynamic Comparators 2015

Dimensionality estimation without distances 2015

The Security of Latent Dirichlet Allocation 2015