The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

Tor Lattimore; Csaba Szepesvári

2017 AISTATS AISTATS 2017

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

Abstract

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the otimism principle and Thompson sampling. Prior analysis has mostly focussed on the worst-case setting. We analyse the asymptotic regret and show matching upper and lower bounds on what is achievable. Surprisingly, our results show that no algorithm based on optimism or Thompson sampling will ever achieve the optimal rate. In fact, they can be arbitrarily far from optimal, even in very simple cases. This is a disturbing result because these techniques are standard tools that are widely used for sequential optimisation, for example, generalised linear bandits and reinforcement learning.

❓ The Questioner

🧭 Keyword Pioneer — optimism principle

🐣 Hot Topic Early Bird — thompson sampling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Tor Lattimore , Csaba Szepesvári

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Theory Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

worst-case analysis asymptotic analysis thompson sampling multi-armed bandit regret bound linear bandit optimism principle asymptotic regret

Download PDF

Related papers

Conditions beyond treewidth for tightness of higher-order LP relaxations 2017

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach 2017

Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis 2017

A Sub-Quadratic Exact Medoid Algorithm 2017

Performance Bounds for Graphical Record Linkage 2017