Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

Raymond Zhang; Richard Combes

2024 NIPS NeurIPS 2024

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

Abstract

We consider Thompson Sampling (TS) for linear combinatorial semi-bandits and subgaussian rewards. We propose the first known TS whose finite-time regret does not scale exponentially with the dimension of the problem. We further show the mismatched sampling paradox: A learner who knows the rewards distributions and samples from the correct posterior distribution can perform exponentially worse than a learner who does not know the rewards and simply samples from a well-chosen Gaussian posterior. The code used to generate the experiments is available at https://github.com/RaymZhang/CTS-Mismatched-Paradox

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

🧭 Keyword Pioneer — subgaussian reward

Authors

Raymond Zhang , Richard Combes

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Optimization & Theory > Bayesian Inference Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Bayesian & Probabilistic > Bayesian Optimization

Keywords

bayesian inference posterior sampling posterior distribution thompson sampling regret bound combinatorial bandit subgaussian reward linear combinatorial semi-bandit mismatched sampling polynomial regret

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024