Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes

David C. Hoyle

2008 JMLR JMLR 2008

Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes

Abstract

Bayesian inference from high-dimensional data involves the integration over a large number of model parameters. Accurate evaluation of such high-dimensional integrals raises a unique set of issues. These issues are illustrated using the exemplar of model selection for principal component analysis (PCA). A Bayesian model selection criterion, based on a Laplace approximation to the model evidence for determining the number of signal principal components present in a data set, has previously been show to perform well on various test data sets. Using simulated data we show that for d-dimensional data and small sample sizes, N, the accuracy of this model selection method is strongly affected by increasing values of d. By taking proper account of the contribution to the evidence from the large number of model parameters we show that model selection accuracy is substantially improved. The accuracy of the improved model evidence is studied in the asymptotic limit d → ∞ at fixed ratio α = N/d, with α < 1. In this limit, model selection based upon the improved model evidence agrees with a frequentist hypothesis testing approach. [abs] [ pdf ][ bib ] © JMLR 2008. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — model evidence

🐣 Hot Topic Early Bird — principal component analysis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio

Authors

David C. Hoyle

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Mathematics & Optimization > Mathematics > Statistics

Keywords

principal component analysis laplace approximation bayesian model selection model evidence dimension selection

Download PDF

Related papers

On the Equivalence of Linear Dimensionality-Reducing Transformations 2008

Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies 2008

Algorithms for Sparse Linear Classifiers in the Massive Data Setting 2008

Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models 2008

Shark 2008