Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

Branislav Kveton; Csaba Szepesvári; Sharan Vaswani; Zheng Wen; Tor Lattimore; Mohammad Ghavamzadeh

2019 ICML ICML 2019

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

Abstract

We propose a bandit algorithm that explores by randomizing its history of rewards. Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards. We design the pseudo rewards such that the bootstrap mean is optimistic with a sufficiently high probability. We call our algorithm Giro, which stands for garbage in, reward out. We analyze Giro in a Bernoulli bandit and derive a $O(K \Delta^{-1} \log n)$ bound on its $n$-round regret, where $\Delta$ is the difference in the expected rewards of the optimal and the best suboptimal arms, and $K$ is the number of arms. The main advantage of our exploration design is that it easily generalizes to structured problems. To show this, we propose contextual Giro with an arbitrary reward generalization model. We evaluate Giro and its contextual variant on multiple synthetic and real-world problems, and observe that it performs well.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Branislav Kveton , Csaba Szepesvári , Sharan Vaswani , Zheng Wen , Tor Lattimore , Mohammad Ghavamzadeh

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Learning Types > Active Learning Mathematics & Optimization > Optimization > Stochastic Methods

Keywords

regret minimization multi-armed bandit contextual bandit exploration strategy bootstrap sampling

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019