Regret Bounds for Gaussian Process Bandit Problems

Steffen Grünewälder; Jean–Yves Audibert; Manfred Opper; John Shawe–Taylor

2010 AISTATS AISTATS 2010

Regret Bounds for Gaussian Process Bandit Problems

Abstract

Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.

🚀 Conference Pioneer — AISTATS 2010

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — bandit optimization

🐣 Hot Topic Early Bird — gaussian process

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

📈 Trend Setter — Multi-Armed Bandits

Authors

Steffen Grünewälder , Jean–Yves Audibert , Manfred Opper , John Shawe–Taylor

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

gaussian process sequential decision bayesian optimization multi-armed bandit regret bound bandit optimization

Download PDF

Related papers

Towards Understanding Situated Natural Language 2010

Mass Fatality Incident Identification based on nuclear DNA evidence 2010

Locally Linear Denoising on Image Manifolds 2010

Negative Results for Active Learning with Convex Losses 2010

Collaborative Filtering on a Budget 2010