Batched Bandit Problems

Vianney Perchet; Philippe Rigollet; Sylvain Chassang; Erik Snowberg

2015 COLT COLT 2015

Batched Bandit Problems

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — batched learning

🐣 Hot Topic Early Bird — clinical trial

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Vianney Perchet , Philippe Rigollet , Sylvain Chassang , Erik Snowberg

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Online Algorithms

Keywords

multi-armed bandit regret bound clinical trial switching cost batched learning

Download PDF

Related papers

Open Problem: Restricted Eigenvalue Condition for Heavy Tailed Designs 2015

Open Problem: The Oracle Complexity of Smooth Convex Optimization in Nonstandard Settings 2015

Online Learning with Feedback Graphs: Beyond Bandits 2015

Learning Overcomplete Latent Variable Models through Tensor Methods 2015

Efficient Learning of Linear Separators under Bounded Noise 2015