Progressive mixture rules are deviation suboptimal

Jean-yves Audibert

2007 NIPS NeurIPS 2007

Progressive mixture rules are deviation suboptimal

Abstract

We consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by the least square loss when the output is bounded), it is known that the progressive mixture rule gn satisfies E R(gn) < min_{g in G} R(g) + Cst (log|G|)/n where n denotes the size of the training set, E denotes the expectation wrt the training set distribution. This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is only no better than Cst / sqrt{n}, and not the expected Cst / n. It also provides an algorithm which does not suffer from this drawback.

🧭 Keyword Pioneer — progressive mixture rules

🐣 Hot Topic Early Bird — convergence rate

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

📈 Trend Setter — Discrete Optimization

Authors

Jean-yves Audibert

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Theory Machine Learning > Learning Types > Online Learning Mathematics & Optimization > Optimization > Discrete Optimization Mathematics & Optimization > Optimization > Optimization

Keywords

statistical learning learning theory model selection online learning convex optimization progressive mixture rules deviation bounds reference sets least squares loss deviation convergence prediction bounds sparse optimization convergence rate ensemble method prediction theory additive term

Download PDF

Related papers

Exponential Family Predictive Representations of State 2007

Privacy-Preserving Belief Propagation and Sampling 2007

Efficient Principled Learning of Thin Junction Trees 2007

How SVMs can estimate quantiles and the median 2007

Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing 2007