Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models

Alexander Terenin; Måns Magnusson; Leif Jonsson

2020 EMNLP EMNLP 2020

Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models

Abstract

AbstractTo scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly sparse data-parallel sampler for the HDP topic model. This sampler utilizes all available sources of sparsity found in natural language - an important way to make computation efficient. We benchmark our method on a well-known corpus (PubMed) with 8m documents and 768m tokens, using a single multi-core machine in under four days.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — data-parallel sampler

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexander Terenin , Måns Magnusson , Leif Jonsson

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Distributed Learning Natural Language Processing > Resources & Methods > Text Representation Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Bayesian & Probabilistic > Bayesian Inference Machine Learning > Core Methods > Topic Modeling

Keywords

topic modeling hierarchical dirichlet process sparse optimization natural language probabilistic model topic model stochastic method parallel training data-parallel sampler

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020