Fast and Accurate $k$-means++ via Rejection Sampling

Vincent Cohen-Addad; Silvio Lattanzi; Ashkan Norouzi-Fard; Christian Sohler; Ola Svensson

2020 NIPS NeurIPS 2020

Fast and Accurate $k$-means++ via Rejection Sampling

Abstract

$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k$-means++ sometimes suffers from being slow on large data-sets so a natural question has been to obtain more efficient algorithms with similar guarantees. In this paper, we present such a near linear time algorithm for $k$-means++ seeding. Interestingly our algorithm obtains the same theoretical guarantees as $k$-means++ and significantly improves earlier results on fast $k$-means++ seeding. Moreover, we show empirically that our algorithm is significantly faster than $k$-means++ and obtains solutions of equivalent quality.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

🧭 Keyword Pioneer — k-means++ clustering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Security & Privacy

Authors

Vincent Cohen-Addad , Silvio Lattanzi , Ashkan Norouzi-Fard , Christian Sohler , Ola Svensson

Topics

Machine Learning > Core Methods > Clustering Data Science & Analytics > Applications > Clustering

Keywords

clustering algorithm rejection sampling seeding algorithm k-means++ clustering near linear time

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020