Penalized Model-Based Clustering with Application to Variable Selection

Wei Pan; Xiaotong Shen

2007 JMLR JMLR 2007

Penalized Model-Based Clustering with Application to Variable Selection

Abstract

Variable selection in clustering analysis is both challenging and important. In the context of model-based clustering analysis with a common diagonal covariance matrix, which is especially suitable for "high dimension, low sample size" settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method. [abs] [ pdf ][ bib ] © JMLR 2007. (edit, beta)

🧭 Keyword Pioneer — penalized likelihood

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Wei Pan , Xiaotong Shen

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Core Methods > Feature Selection

Keywords

variable selection em algorithm model-based clustering sparse solution penalized likelihood bic model selection

Download PDF

Related papers

Statistical Consistency of Kernel Canonical Correlation Analysis 2007

A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians 2007

Large Margin Semi-supervised Learning 2007

Refinable Kernels 2007

Building Blocks for Variational Bayesian Learning of Latent Variable Models 2007