FilterBoost: Regression and Classification on Large Datasets

Joseph K. Bradley; Robert E. Schapire

2007 NIPS NeurIPS 2007

FilterBoost: Regression and Classification on Large Datasets

Abstract

We study boosting in the ﬁltering setting, where the booster draws examples from an oracle instead of using a ﬁxed training set and so may train efﬁciently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the ﬁrst proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the ﬁltering setting. Our proofs demonstrate the algorithm’s strong theoretical proper- ties for both classiﬁcation and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overﬁtting than batch boosters in conditional probability estimation and proves competitive in classiﬁcation.

📈 Trend Setter — Active Learning

🧭 Keyword Pioneer — conditional probability estimation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning

🐣 Hot Topic Early Bird — online learning

Authors

Joseph K. Bradley , Robert E. Schapire

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Regression Machine Learning > Learning Types > Active Learning Machine Learning > Learning Types > Online Learning Machine Learning > Learning Types > Supervised Learning Machine Learning > Learning Types > Classification Machine Learning > Learning Types > Ensemble Methods

Keywords

online learning classification logistic regression pac learning regression conditional probability estimation large-scale learning boosting algorithm large dataset filterboost algorithm

Download PDF

Related papers

Exponential Family Predictive Representations of State 2007

Privacy-Preserving Belief Propagation and Sampling 2007

Efficient Principled Learning of Thin Junction Trees 2007

How SVMs can estimate quantiles and the median 2007

Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing 2007