2024
NIPS
NeurIPS 2024
Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox
Abstract
We consider Thompson Sampling (TS) for linear combinatorial semi-bandits and subgaussian rewards. We propose the first known TS whose finite-time regret does not scale exponentially with the dimension of the problem. We further show the mismatched sampling paradox: A learner who knows the rewards distributions and samples from the correct posterior distribution can perform exponentially worse than a learner who does not know the rewards and simply samples from a well-chosen Gaussian posterior. The code used to generate the experiments is available at https://github.com/RaymZhang/CTS-Mismatched-Paradox
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Mathematics & Optimization
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
🧭
Keyword Pioneer
— subgaussian reward
Authors
Topics
Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning
Machine Learning > Optimization & Theory > Bayesian Inference
Mathematics & Optimization > Optimization > Online Algorithms
Machine Learning > Optimization & Theory > Online Algorithms
Machine Learning > Optimization & Theory > Stochastic Methods
Machine Learning > Learning Types > Multi-Armed Bandits
Machine Learning > Bayesian & Probabilistic > Bayesian Optimization