Thompson Sampling and Approximate Inference

My Phan; Yasin Abbasi Yadkori; Justin Domke

2019 NIPS NeurIPS 2019

Thompson Sampling and Approximate Inference

Abstract

We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems. Thompson sampling is a successful algorithm for online decision-making but requires posterior inference, which often must be approximated in practice. We show that even small constant inference error (in $\alpha$-divergence) can lead to poor performance (linear regret) due to under-exploration (for $\alpha<1$) or over-exploration (for $\alpha>0$) by the approximation. While for $\alpha > 0$ this is unavoidable, for $\alpha \leq 0$ the regret can be improved by adding a small amount of forced exploration even when the inference error is a large constant.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

My Phan , Yasin Abbasi Yadkori , Justin Domke

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Artificial Intelligence > Learning Paradigms > Meta-Learning Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Optimization Machine Learning > Bayesian & Probabilistic > Bayesian Inference Machine Learning > Learning Types > Multi-Armed Bandits Artificial Intelligence > Core AI > Decision Making

Keywords

approximate inference online decision making thompson sampling multi-armed bandit regret bound online decision-making

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019