Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

Emilie Kaufmann; Wouter M. Koolen; Aurélien Garivier

2018 NIPS NeurIPS 2018

Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

Abstract

Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-problem in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

📈 Trend Setter — Exploration

🐣 Hot Topic Early Bird — thompson sampling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Emilie Kaufmann , Wouter M. Koolen , Aurélien Garivier

Topics

Artificial Intelligence > Core AI > Planning Machine Learning > Core Methods > Classification Machine Learning > Bayesian & Probabilistic > Bayesian Inference Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Learning Types > Exploration

Keywords

mean estimation sequential testing thompson sampling multi-armed bandit sampling method optimal stopping

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Recurrent World Models Facilitate Policy Evolution 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018