Fast Large-scale Mixture Modeling with Component-specific Data Partitions

Bo Thiesson; Chong Wang

2010 NIPS NeurIPS 2010

Fast Large-scale Mixture Modeling with Component-specific Data Partitions

Abstract

Remarkably easy implementation and guaranteed convergence has made the EM algorithm one of the most used algorithms for mixture modeling. On the downside, the E-step is linear in both the sample size and the number of mixture components, making it impractical for large-scale data. Based on the variational EM framework, we propose a fast alternative that uses component-specific data partitions to obtain a sub-linear E-step in sample size, while the algorithm still maintains provable convergence. Our approach builds on previous work, but is significantly faster and scales much better in the number of mixture components. We demonstrate this speedup by experiments on large-scale synthetic and real data.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

📈 Trend Setter — Clustering

🧭 Keyword Pioneer — large-scale clustering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Speech & Audio

🐣 Hot Topic Early Bird — variational inference

Authors

Bo Thiesson , Chong Wang

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Optimization & Theory > Optimization Data Science & Analytics > Applications > Clustering Machine Learning > Bayesian & Probabilistic > Variational Inference

Keywords

variational inference em algorithm large-scale clustering variational em data partitions expectation-maximization algorithm mixture modeling mixture model large-scale datum component-specific partition

Download PDF

Related papers

Link Discovery using Graph Feature Tracking 2010

Trading off Mistakes and Don't-Know Predictions 2010

A Novel Kernel for Learning a Neuron Model from Spike Train Data 2010

Decomposing Isotonic Regression for Efficiently Solving Large Problems 2010

Learning Kernels with Radiuses of Minimum Enclosing Balls 2010