2025 JMLR JMLR 2025

Bayesian Sparse Gaussian Mixture Model for Clustering in High Dimensions

Abstract

We study the sparse high-dimensional Gaussian mixture model when the number of clusters is allowed to grow with the sample size. A minimax lower bound for parameter estimation is established, and we show that a constrained maximum likelihood estimator achieves the minimax lower bound. However, this optimization-based estimator is computationally intractable because the objective function is highly nonconvex and the feasible set involves discrete structures. To address the computational challenge, we propose a computationally tractable Bayesian approach to estimate high-dimensional Gaussian mixtures whose cluster centers exhibit sparsity using a continuous spike-and-slab prior. We further prove that the posterior contraction rate of the proposed Bayesian method is minimax optimal. The mis- clustering rate is obtained as a by-product using tools from matrix perturbation theory. The proposed Bayesian sparse Gaussian mixture model does not require pre-specifying the number of clusters, which can be adaptively estimated. The validity and usefulness of the proposed method is demonstrated through simulation studies and the analysis of a real-world single-cell RNA sequencing data set. [abs] [ pdf ][ bib ] © JMLR 2025. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio