Regularized K-means Through Hard-Thresholding

Jakob Raymaekers; Ruben H. Zamar

2022 JMLR JMLR 2022

Regularized K-means Through Hard-Thresholding

Abstract

We study a framework for performing regularized K-means, based on direct penalization of the size of the cluster centers. Different penalization strategies are considered and compared in a theoretical analysis and an extensive Monte Carlo simulation study. Based on the results, we propose a new method called hard-threshold K-means (HTK-means), which uses an ℓ0 penalty to induce sparsity. HTK-means is a fast and competitive sparse clustering method which is easily interpretable, as is illustrated on several real data examples. In this context, new graphical displays are presented and used to gain further insight into the data sets. [abs] [ pdf ][ bib ] [ code ] © JMLR 2022. (edit, beta)

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

🧭 Keyword Pioneer — sparse clustering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Jakob Raymaekers , Ruben H. Zamar

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Learning Types > Unsupervised Learning Data Science & Analytics > Applications > Clustering

Keywords

k-means clustering hard thresholding sparse clustering regularized k-mean l0 penalty cluster center

Download PDF

Related papers

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping 2022

LinCDE: Conditional Density Estimation via Lindsey's Method 2022

Causal Classification: Treatment Effect Estimation vs. Outcome Prediction 2022

Provable Tensor-Train Format Tensor Completion by Riemannian Optimization 2022

Power Iteration for Tensor PCA 2022