Linear Thompson Sampling Revisited

Marc Abeille; Alessandro Lazaric

2017 AISTATS AISTATS 2017

Linear Thompson Sampling Revisited

Abstract

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order $O(d^3/2\sqrtT)$ as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to \textitoptimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrtd$ regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — optimistic parameter

🐣 Hot Topic Early Bird — thompson sampling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Marc Abeille , Alessandro Lazaric

Topics

Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

stochastic optimization thompson sampling regret bound linear bandit optimistic parameter

Download PDF

Related papers

Conditions beyond treewidth for tightness of higher-order LP relaxations 2017

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach 2017

Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis 2017

A Sub-Quadratic Exact Medoid Algorithm 2017

Performance Bounds for Graphical Record Linkage 2017