Stochastic bandits with arm-dependent delays

Manegueu Anne Gael; Claire Vernade; Alexandra Carpentier; Michal Valko

2020 ICML ICML 2020

Stochastic bandits with arm-dependent delays

Abstract

Significant work has been recently dedicated to the stochastic delayed bandits because of its relevance in applications. The applicability of existing algorithms is however restricted by the fact that strong assumptions are often made on the delay distributions, such as full observability, restrictive shape constraints, or uniformity over arms. In this work, we weaken them significantly and only assume that there is a bound on the tail of the delay. In particular, we cover the important case where the delay distributions vary across arms, and the case where the delays are heavy-tailed. Addressing these difficulties, we propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problemsdependent and problems-independent bounds on the regret as well as performance lower bounds.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — stochastic delay

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Manegueu Anne Gael , Claire Vernade , Alexandra Carpentier , Michal Valko

Topics

Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Learning Types > Reinforcement Learning

Keywords

multi-armed bandit upper confidence bound regret bound delayed feedback heavy-tailed distribution stochastic delay

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020