2019 COLT COLT 2019

Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes

Abstract

We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization
🧭 Keyword Pioneer — non-stationary bandit
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy