Clustering Stable Instances of Euclidean k-means.

Aravindan Vijayaraghavan; Abhratanu Dutta; Alex Wang

2017 NIPS NeurIPS 2017

Clustering Stable Instances of Euclidean k-means.

Abstract

The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.

🌉 Interdisciplinary Bridge — Computer Science and Data Science & Analytics and Machine Learning and Mathematics & Optimization

📈 Trend Setter — Theory

🧭 Keyword Pioneer — additive perturbation

🐣 Hot Topic Early Bird — k-means clustering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Aravindan Vijayaraghavan , Abhratanu Dutta , Alex Wang

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Optimization & Theory > Learning Theory Data Science & Analytics > Applications > Clustering Mathematics & Optimization > Optimization > Combinatorial Optimization Computer Science > Foundations > Algorithms Mathematics & Optimization > Optimization > Theory

Keywords

k-means clustering clustering stability perturbation analysis optimal clustering euclidean distance clustering algorithm perturbation stability additive perturbation euclidean k-mean

Download PDF

Related papers

High-Order Attention Models for Visual Question Answering 2017

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization 2017

Premise Selection for Theorem Proving by Deep Graph Embedding 2017

Neural Program Meta-Induction 2017

Safe and Nested Subgame Solving for Imperfect-Information Games 2017