Distributed Stochastic Gradient MCMC

Sungjin Ahn; Babak Shahbaba; Max Welling

2014 ICML ICML 2014

Distributed Stochastic Gradient MCMC

Abstract

Probabilistic inference on a big data scale is becoming increasingly relevant to both the machine learning and statistics communities. Here we introduce the first fully distributed MCMC algorithm based on stochastic gradients. We argue that stochastic gradient MCMC algorithms are particularly suited for distributed inference because individual chains can draw minibatches from their local pool of data for a flexible amount of time before jumping to or syncing with other chains. This greatly reduces communication overhead and allows adaptive load balancing. Our experiments for LDA on Wikipedia and Pubmed show that relative to the state of the art in distributed MCMC we reduce compute time from 27 hours to half an hour in order to reach the same perplexity level.

🧭 Keyword Pioneer — stochastic gradient markov chain monte carlo

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

🐣 Hot Topic Early Bird — stochastic gradient

Authors

Sungjin Ahn , Babak Shahbaba , Max Welling

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Bayesian & Probabilistic > Bayesian Inference

Keywords

stochastic gradient bayesian learning latent dirichlet allocation probabilistic inference topic modeling markov chain monte carlo distributed inference stochastic gradient markov chain monte carlo

Download PDF

Related papers

Demystifying Information-Theoretic Clustering 2014

Margins, Kernels and Non-linear Smoothed Perceptrons 2014

Large-Margin Metric Learning for Constrained Partitioning Problems 2014

Efficient Approximation of Cross-Validation for Kernel Methods using Bouligand Influence Function 2014

Generalized Exponential Concentration Inequality for Renyi Divergence Estimation 2014