Delayed Feedback in Generalised Linear Bandits Revisited

Benjamin Howson; Ciara Pike-Burke; Sarah Filippi

2023 AISTATS AISTATS 2023

Delayed Feedback in Generalised Linear Bandits Revisited

Abstract

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed. We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback setting can achieve regret of $\widetilde{\mathcal{O}}(d\sqrt{T} + d^{3/2}\mathbb{E}[\tau]\,)$, where $\mathbb{E}[\tau]$ denotes the expected delay, $d$ is the dimension and $T$ is the time horizon. This significantly improves upon existing approaches for this setting where the best known regret bound was $ \widetilde{\mathcal{O}}(\sqrt{dT}\sqrt{d + \mathbb{E}[\tau]}\,)$. We verify our theoretical results through experiments on simulated data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — generalized linear bandit

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

🐣 Hot Topic Early Bird — sequential decision-making

Authors

Benjamin Howson , Ciara Pike-Burke , Sarah Filippi

Topics

Artificial Intelligence > Learning Paradigms > Meta-Learning Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

sequential decision-making regret bound delayed feedback optimistic algorithm stochastic bandit generalized linear bandit reward delay generalised linear bandit

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023