2013
COLT
COLT 2013
Bounded regret in stochastic multi-armed bandits
Abstract
We study the stochastic multi-armed bandit problem when one knows the value μ^(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order 1/∆is not possible if one only knows μ^(⋆).
🧭
Keyword Pioneer
— stochastic policy
🐣
Hot Topic Early Bird
— multi-armed bandit
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning
🌉
Interdisciplinary Bridge
— Machine Learning and Mathematics & Optimization