Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

Tor Lattimore

2016 COLT COLT 2016

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

Abstract

I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is an improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tor Lattimore

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Optimization & Theory > Stochastic Processes Mathematics & Optimization > Optimization > Online Algorithms

Keywords

bayesian inference gaussian process multi-armed bandit regret bound gittins index

Download PDF

Related papers

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies 2016

Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture 2016

Open Problem: Kernel methods on manifolds and metric spaces. What is the probability of a positive definite geodesic exponential kernel? 2016

Learning and Testing Junta Distributions 2016

Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes 2016