TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation

Jackie Baek; Vivek Farias

2023 AISTATS AISTATS 2023

TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation

Abstract

Thompson sampling has become a ubiquitous approach to online decision problems with bandit feedback. The key algorithmic task for Thompson sampling is drawing a sample from the posterior of the optimal action. We propose an alternative arm selection rule we dub TS-UCB, that requires negligible additional computational effort but provides significant performance improvements relative to Thompson sampling. At each step, TS-UCB computes a score for each arm using two ingredients: posterior sample(s) and upper confidence bounds. TS-UCB can be used in any setting where these two quantities are available, and it is flexible in the number of posterior samples it takes as input. TS-UCB achieves materially lower regret on a comprehensive suite of synthetic and real-world datasets, including a personalized article recommendation dataset from Yahoo! and a suite of benchmark datasets from a deep bandit suite proposed in Riquelme et al. (2018). Finally, from a theoretical perspective, we establish optimal regret guarantees for TS-UCB for both the K-armed and linear bandit models.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — online decision

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Jackie Baek , Vivek Farias

Topics

Machine Learning > Optimization & Theory > Stochastic Processes Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

thompson sampling multi-armed bandit upper confidence bound regret bound online algorithm online decision

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023