Asynchronous Distributed Learning of Topic Models

Padhraic Smyth; Max Welling; Arthur U. Asuncion

2008 NIPS NeurIPS 2008

Asynchronous Distributed Learning of Topic Models

Abstract

Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known unsupervised learning frameworks: Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Processes (HDP). In the proposed approach, the data are distributed across P processors, and processors independently perform Gibbs sampling on their local data and communicate their information in a local asynchronous manner with other processors. We demonstrate that our asynchronous algorithms are able to learn global topic models that are statistically as accurate as those learned by the standard LDA and HDP samplers, but with significant improvements in computation time and memory. We show speedup results on a 730-million-word text corpus using 32 processors, and we provide perplexity results for up to 1500 virtual processors. As a stepping stone in the development of asynchronous HDP, a parallel HDP sampler is also introduced.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

📈 Trend Setter — Distributed Learning

🧭 Keyword Pioneer — asynchronous computation

🐣 Hot Topic Early Bird — distributed learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌱 Topic Pioneer — Federated Learning

Authors

Padhraic Smyth , Max Welling , Arthur U. Asuncion

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Optimization & Theory > Distributed Learning Natural Language Processing > Resources & Methods > Text Representation Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Learning Types > Federated Learning Machine Learning > Learning Types > Distributed Learning Natural Language Processing > Applications > Topic Modeling

Keywords

hierarchical dirichlet processes latent dirichlet allocation distributed learning gibbs sampling hierarchical dirichlet process asynchronous computation topic model asynchronous algorithm

Download PDF

Related papers

On the Efficient Minimization of Classification Calibrated Surrogates 2008

Hebbian Learning of Bayes Optimal Decisions 2008

Biasing Approximate Dynamic Programming with a Lower Discount Factor 2008

Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation 2008

Domain Adaptation with Multiple Sources 2008