PG-means: learning the number of clusters in data

Yu Feng; Greg Hamerly

2006 NIPS NeurIPS 2006

PG-means: learning the number of clusters in data

Abstract

We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efficient; it uses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difficult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods.

🚀 Conference Pioneer — NIPS 2006

🧭 Keyword Pioneer — cluster number estimation

🐣 Hot Topic Early Bird — unsupervised learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

📈 Trend Setter — Unsupervised Learning

Authors

Yu Feng , Greg Hamerly

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Optimization & Theory > Statistical Learning Data Science & Analytics > Applications > Clustering Machine Learning > Learning Paradigms > Unsupervised Learning

Keywords

unsupervised learning model selection expectation maximization cluster number estimation hypothesis testing statistical testing gaussian mixture model hypothesis test cluster number cluster count

Download PDF

Related papers

Temporal Coding using the Response Properties of Spiking Neurons 2006

Parameter Expanded Variational Bayesian Methods 2006

Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning 2006

Ordinal Regression by Extended Binary Classification 2006

Blind source separation for over-determined delayed mixtures 2006