2011 COLT COLT 2011

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Abstract

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

🚀 Conference Pioneer — COLT 2011
🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization
📈 Trend Setter — Optimization
🐣 Hot Topic Early Bird — stochastic optimization
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio