Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

Marcus Hutter

2003 JMLR JMLR 2003

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

Abstract

Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied. The probability of observing x t at time t , given past observations x 1 ... x t-1 can be computed with the chain rule if the true generating distribution μ of the sequences x 1 x 2 x 3 .... is known. If μ is unknown, but known to belong to a countable or continuous class Μ one can base ones prediction on the Bayes-mixture ξ defined as a w ν -weighted sum or integral of distributions ν ∈ Μ. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on ξ is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on μ. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds. Furthermore, for various performance measures, we show Pareto-optimality of ξ and give an Occam's razor argument that the choice w ν ∼ 2 -K(ν) for the weights is optimal, where K (ν) is the length of the shortest program describing ν. The results are applied to games of chance, defined as a sequence of bets, observations, and rewards. The prediction schemes (and bounds) are compared to the popular predictors based on expert advice. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed. [abs] [ pdf ][ ps.gz ][ ps ]

🌱 Topic Pioneer — Probability

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

📈 Trend Setter — Bayesian Learning

🧭 Keyword Pioneer — learning theory

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

🐣 Hot Topic Early Bird — learning theory

Authors

Marcus Hutter

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Mathematics > Probability Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Bayesian & Probabilistic > Bayesian Inference

Keywords

learning theory probabilistic modeling bayesian inference sequence prediction universal predictor solomonoff induction occam razor occam's razor

Download PDF

Related papers

Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction 2003

An Efficient Boosting Algorithm for Combining Preferences 2003

A Multiscale Framework For Blind Separation of Linearly Mixed Signals 2003

Word-Sequence Kernels 2003

An Extensive Empirical Study of Feature Selection Metrics for Text Classification 2003