2016 ICML ICML 2016

Clustering High Dimensional Categorical Data via Topographical Features

Abstract

Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.

🧭 Keyword Pioneer — categorical data clustering
🐝 Cross-Pollinator — Artificial Intelligence, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization
🐣 Hot Topic Early Bird — high-dimensional datum