Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes

Peter Auer; Pratik Gajane; Ronald Ortner

2019 COLT COLT 2019

Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes

Abstract

We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — non-stationary bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Peter Auer , Pratik Gajane , Ronald Ortner

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Mathematics > Statistics Mathematics & Optimization > Optimization > Online Algorithms

Keywords

dynamic regret minimax regret multi-armed bandit regret bound adaptive algorithm non-stationary bandit

Download PDF

Related papers

Inference under Information Constraints: Lower Bounds from Chi-Square Contraction 2019

Learning in Non-convex Games with an Optimization Oracle 2019

Learning to Prune: Speeding up Repeated Computations 2019

A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise 2019

Learning Two Layer Rectified Neural Networks in Polynomial Time 2019