Improved Learning Complexity in Combinatorial Pure Exploration Bandits

Victor Gabillon; Alessandro Lazaric; Mohammad Ghavamzadeh; Ronald Ortner; Peter Bartlett

2016 AISTATS AISTATS 2016

Improved Learning Complexity in Combinatorial Pure Exploration Bandits

Abstract

We study the problem of combinatorial pure exploration in the stochastic multi-armed bandit problem. We first construct a new measure of complexity that provably characterizes the learning performance of the algorithms we propose for the fixed confidence and the fixed budget setting. We show that this complexity is never higher than the one in existing work and illustrate a number of configurations in which it can be significantly smaller. While in general this improvement comes at the cost of increased computational complexity, we provide a series of examples, including a planning problem, where this extra cost is not significant.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — fixed budget setting

🐣 Hot Topic Early Bird — combinatorial optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Victor Gabillon , Alessandro Lazaric , Mohammad Ghavamzadeh , Ronald Ortner , Peter Bartlett

Topics

Machine Learning > Learning Types > Active Learning Mathematics & Optimization > Optimization > Combinatorial Optimization Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

combinatorial optimization sample complexity multi-armed bandit pure exploration learning complexity combinatorial pure exploration fixed confidence setting fixed budget setting

Download PDF

Related papers

Bipartite Correlation Clustering: Maximizing Agreements 2016

Precision Matrix Estimation in High Dimensional Gaussian Graphical Models with Faster Rates 2016

On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes 2016

Time-Varying Gaussian Process Bandit Optimization 2016

Bayesian Markov Blanket Estimation 2016