A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Odalric-ambrym Maillard; Rémi Munos; Gilles Stoltz

2011 COLT COLT 2011

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Abstract

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

🚀 Conference Pioneer — COLT 2011

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

📈 Trend Setter — Optimization

🐣 Hot Topic Early Bird — stochastic optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Odalric-ambrym Maillard , Rémi Munos , Gilles Stoltz

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Optimization Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Stochastic Methods Mathematics & Optimization > Optimization > Optimization Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

stochastic optimization regret minimization kullback-leibler divergence finite-time analysis finite-sample analysis multi-armed bandit regret bound

Download PDF

Related papers

Competitive Closeness Testing 2011

Bandits, Query Learning, and the Haystack Dimension 2011

Minimax Policies for Combinatorial Prediction Games 2011

Sample Complexity Bounds for Differentially Private Learning 2011

Multiclass Learnability and the ERM principle 2011