Streaming k-means approximation

Nir Ailon; Ragesh Jaiswal; Claire Monteleoni

2009 NIPS NeurIPS 2009

Streaming k-means approximation

Abstract

We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. We make no assumptions about the data, and our algorithm is very light-weight in terms of memory, and computation. This setting is applicable to unsupervised learning on massive data sets, or resource-constrained devices. The two main ingredients of our theoretical work are: a derivation of an extremely simple pseudo-approximation batch algorithm for k-means, in which the algorithm is allowed to output more than k centers (based on the recent k-means++"), and a streaming clustering algorithm in which batch clustering algorithms are performed on small inputs (fitting in memory) and combined in a hierarchical manner. Empirical evaluations on real and simulated data reveal the practical utility of our method."

🌉 Interdisciplinary Bridge — Computer Science and Data Science & Analytics and Machine Learning

📈 Trend Setter — Clustering

🧭 Keyword Pioneer — clustering approximation

🐣 Hot Topic Early Bird — unsupervised learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Nir Ailon , Ragesh Jaiswal , Claire Monteleoni

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Optimization & Theory > Optimization Data Science & Analytics > Applications > Clustering Computer Science > Foundations > Algorithms Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Paradigms > Unsupervised Learning

Keywords

unsupervised learning k-means clustering hierarchical clustering clustering approximation streaming algorithm data stream approximation algorithm one-pass streaming

Download PDF

Related papers

Solving Stochastic Games 2009

Bilinear classifiers for visual recognition 2009

Zero-shot Learning with Semantic Output Codes 2009

Matrix Completion from Power-Law Distributed Samples 2009

Heavy-Tailed Symmetric Stochastic Neighbor Embedding 2009