Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption

Philippe Rigollet

2007 JMLR JMLR 2007

Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption

Abstract

We consider semi-supervised classification when part of the available data is unlabeled. These unlabeled data can be useful for the classification problem when we make an assumption relating the behavior of the regression function to that of the marginal distribution. Seeger (2000) proposed the well-known cluster assumption as a reasonable one. We propose a mathematical formulation of this assumption and a method based on density level sets estimation that takes advantage of it to achieve fast rates of convergence both in the number of unlabeled examples and the number of labeled examples. [abs] [ pdf ][ bib ] © JMLR 2007. (edit, beta)

🐣 Hot Topic Early Bird — semi-supervised learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Philippe Rigollet

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Machine Learning > Optimization & Theory > Statistical Learning

Keywords

semi-supervised learning density estimation generalization error cluster assumption

Download PDF

Related papers

Statistical Consistency of Kernel Canonical Correlation Analysis 2007

A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians 2007

Large Margin Semi-supervised Learning 2007

Refinable Kernels 2007

Building Blocks for Variational Bayesian Learning of Latent Variable Models 2007