Residual bootstrap exploration for stochastic linear bandit

Shuang Wu; Chi-Hua Wang; Yuantong Li; Guang Cheng

2022 UAI UAI 2022

Residual bootstrap exploration for stochastic linear bandit

Abstract

We propose a new bootstrap-based online algorithm for stochastic linear bandit problems. The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next step reward by re-sampling the residuals of mean reward estimate. Our algorithm, residual bootstrap exploration for stochastic linear bandit (\texttt{LinReBoot}), estimates the linear reward from its re-sampling distribution and pulls the arm with the highest reward estimate. In particular, we contribute a theoretical framework to demystify residual bootstrap-based exploration mechanisms in stochastic linear bandit problems. The key insight is that the strength of bootstrap exploration is based on collaborated optimism between the online-learned model and the re-sampling distribution of residuals. Such observation enables us to show that the proposed \texttt{LinReBoot} secure a high-probability $\tilde{O}(d \sqrt{n})$ sub-linear regret under mild conditions. Our experiments support the easy generalizability of the \texttt{ReBoot} principle in the various formulations of linear bandit problems and show the significant computational efficiency of \texttt{LinReBoot}.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — bootstrap exploration

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shuang Wu , Chi-Hua Wang , Yuantong Li , Guang Cheng

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Stochastic Processes

Keywords

online learning regret bound stochastic linear bandit bootstrap exploration

Download PDF

Related papers

Combating the instability of mutual information-based losses via regularization 2022

Future gradient descent for adapting the temporal shifting data distribution in online recommendation systems 2022

Privacy-aware compression for federated data analysis 2022

Fixing the Bethe approximation: How structural modifications in a graph improve belief propagation 2022

Probabilistic spatial transformer networks 2022