Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

Lilian Besson; Emilie Kaufmann; Odalric-ambrym Maillard; Julien Seznec

2022 JMLR JMLR 2022

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

Abstract

We introduce GLRklUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, klUCB, with an efficient, parameter-free, change-point detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLRklUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a $O(\sqrt{TA\Upsilon_T\log(T)})$ regret in $T$ rounds on some “easy” instances in which there is sufficient delay between two change-points, where $A$ is the number of arms and $\Upsilon_T$ the number of change-points, without prior knowledge of $\Upsilon_T$. In contrast with recently proposed algorithms that are agnostic to $\Upsilon_T$, we perform a numerical study showing that GLRklUCB is also very efficient in practice, beyond easy instances. [abs] [ pdf ][ bib ] [ code ] © JMLR 2022. (edit, beta)

🧭 Keyword Pioneer — piecewise-stationary bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Lilian Besson , Emilie Kaufmann , Odalric-ambrym Maillard , Julien Seznec

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Theory Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

change-point detection generalized likelihood ratio multi-armed bandit regret bound bandit algorithm non-stationary environment non-stationary learning piecewise-stationary bandit

Download PDF

Related papers

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping 2022

LinCDE: Conditional Density Estimation via Lindsey's Method 2022

Causal Classification: Treatment Effect Estimation vs. Outcome Prediction 2022

Provable Tensor-Train Format Tensor Completion by Riemannian Optimization 2022

Power Iteration for Tensor PCA 2022