Regret Bounds for Batched Bandits

Hossein Esfandiari; Amin Karbasi; Abbas Mehrabian; Vahab Mirrokni

2021 AAAI AAAI 2021

Regret Bounds for Batched Bandits

Abstract

Abstract We present simple algorithms for batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve and extend the best known regret bounds of Gao, Han, Ren, and Zhou (NeurIPS 2019), for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and provide the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Hossein Esfandiari , Amin Karbasi , Abbas Mehrabian , Vahab Mirrokni

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Online Algorithms

Keywords

multi-armed bandit regret bound stochastic bandit adversarial bandit batched bandit

Download PDF

Related papers

Contextual Conditional Reasoning 2021

Attention Beam: An Image Captioning Approach (Student Abstract) 2021

Movie Summarization via Sparse Graph Construction 2021

Text Analysis for Understanding Symptoms of Social Anxiety in Student Veterans 2021

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs 2021