An Information-Theoretic Analysis of Thompson Sampling

Daniel Russo; Benjamin Van Roy

2016 JMLR JMLR 2016

An Information-Theoretic Analysis of Thompson Sampling

Abstract

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance. [abs] [ pdf ][ bib ] © JMLR 2016. (edit, beta)

🌉 Interdisciplinary Bridge — Artificial Intelligence and Mathematics & Optimization

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Daniel Russo , Benjamin Van Roy

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Mathematics & Optimization > Mathematics > Information Theory Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Information Theory Machine Learning > Bayesian & Probabilistic > Bayesian Inference Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

information theory online optimization thompson sampling partial feedback multi-armed bandit regret bound

Download PDF

Related papers

Trend Filtering on Graphs 2016

Causal Inference through a Witness Protection Program 2016

A Characterization of Linkage-Based Hierarchical Clustering 2016

How to Center Deep Boltzmann Machines 2016

Minimax Rates in Permutation Estimation for Feature Matching 2016