Almost Optimal Exploration in Multi-Armed Bandits

Zohar Karnin; Tomer Koren; Oren Somekh

2013 ICML ICML 2013

Almost Optimal Exploration in Multi-Armed Bandits

Abstract

We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases.

🚀 Conference Pioneer — ICML 2013

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — arm pull budget

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Zohar Karnin , Tomer Koren , Oren Somekh

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

logarithmic regret best arm identification multi-armed bandit regret bound exploration algorithm arm pull budget doubly-logarithmic bound

Download PDF

Related papers

Convex Adversarial Collective Classification 2013

Gaussian Process Vine Copulas for Multivariate Dependence 2013

Stochastic Simultaneous Optimistic Optimization 2013

Generic Exploration and K-armed Voting Bandits 2013

Robust Structural Metric Learning 2013