Bounded regret in stochastic multi-armed bandits

Sébastien Bubeck; Vianney Perchet; Philippe Rigollet

2013 COLT COLT 2013

Bounded regret in stochastic multi-armed bandits

Abstract

We study the stochastic multi-armed bandit problem when one knows the value μ^(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆is not possible if one only knows μ^(⋆).

🧭 Keyword Pioneer — stochastic policy

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

Authors

Sébastien Bubeck , Vianney Perchet , Philippe Rigollet

Topics

Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Agent Systems Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

stochastic optimization multi-armed bandit regret bound stochastic bandit bandit algorithm stochastic policy bounded regret optimal arm randomized policy gap estimation

Download PDF

Related papers

A Tensor Spectral Approach to Learning Mixed Membership Community Models 2013

Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem 2013

Boosting with the Logistic Loss is Consistent 2013

Online Learning with Predictable Sequences 2013

Recovering the Optimal Solution by Dual Random Projection 2013