A Faster Sampling Algorithm for Spherical $k$-means

Rameshwar Pratap; Anup Deshmukh; Pratheeksha Nair; Tarun Dutt

2018 ACML ACML 2018

A Faster Sampling Algorithm for Spherical $k$-means

Abstract

The Spherical $k$-means algorithm proposed by (Dhillon and Modha, 2001) is a popular algorithm for clustering high dimensional datasets. Although their algorithm is simple and easy to implement, a drawback of the same is that it doesn’t provide any provable guarantee on the clustering result. (Endo and Miyamoto, 2015) suggest an adaptive sampling based algorithm (Spherical $k$-means$++$) which gives near optimal results, with high probability. However, their algorithm requires $k$ sequential passes over the entire dataset, which may not be feasible when the dataset and/or the values of $k$ are large. In this work, we propose a Markov chain based sampling algorithm that takes only one pass over the data, and gives close to optimal clustering similar to Spherical $k$-means$++$, i.e., a faster algorithm while maintaining almost the same approximation. We present a theoretical analysis of the algorithm, and complement it with rigorous experiments on real-world datasets. Our proposed algorithm is simple and easy to implement, and can be easily adopted in practice.

🧭 Keyword Pioneer — spherical k-mean

🐣 Hot Topic Early Bird — markov chain

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Rameshwar Pratap , Anup Deshmukh , Pratheeksha Nair , Tarun Dutt

Topics

Machine Learning > Core Methods > Clustering

Keywords

markov chain sequential sampling clustering algorithm spherical k-mean

Download PDF

Related papers

Unsupervised Heterogeneous Domain Adaptation with Sparse Feature Transformation 2018

Structured Gaussian Processes with Twin Multiple Kernel Learning 2018

Discriminative Feature Representation for Person Re-identification by Batch-contrastive Loss 2018

Adversarial TableQA: Attention Supervision for Question Answering on Tables 2018

Who Are Raising Their Hands? Hand-Raiser Seeking Based on Object Detection and Pose Estimation 2018