Unsupervised Feature Selection for the $k$-means Clustering Problem

Christos Boutsidis; Petros Drineas; Michael W. Mahoney

2009 NIPS NeurIPS 2009

Unsupervised Feature Selection for the $k$-means Clustering Problem

Abstract

We present a novel feature selection algorithm for the $k$-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter $\epsilon \in (0,1)$, selects and appropriately rescales in an unsupervised manner $\Theta(k \log(k / \epsilon) / \epsilon^2)$ features from a dataset of arbitrary dimensions. We prove that, if we run any $\gamma$-approximate $k$-means algorithm ($\gamma \geq 1$) on the features selected using our method, we can find a $(1+(1+\epsilon)\gamma)$-approximate partition with high probability.

🧭 Keyword Pioneer — unsupervised feature selection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

📈 Trend Setter — Unsupervised Learning

🐣 Hot Topic Early Bird — k-means clustering

Authors

Christos Boutsidis , Petros Drineas , Michael W. Mahoney

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Optimization & Theory > Optimization Machine Learning > Core Methods > Feature Selection Machine Learning > Learning Paradigms > Unsupervised Learning

Keywords

unsupervised learning k-means clustering feature selection unsupervised feature selection feature rescaling randomized algorithm approximation bound

Download PDF

Related papers

Solving Stochastic Games 2009

Bilinear classifiers for visual recognition 2009

Zero-shot Learning with Semantic Output Codes 2009

Matrix Completion from Power-Law Distributed Samples 2009

Heavy-Tailed Symmetric Stochastic Neighbor Embedding 2009