Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Ian Osband; Benjamin Van Roy

2017 ICML ICML 2017

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Abstract

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it. We leverage this insight to establish an $\tilde{O}(H\sqrt{SAT})$ Bayesian regret bound for PSRL in finite-horizon episodic Markov decision processes. This improves upon the best previous Bayesian regret bound of $\tilde{O}(H S \sqrt{AT})$ for any reinforcement learning algorithm. Our theoretical results are supported by extensive empirical evaluation.

❓ The Questioner

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐣 Hot Topic Early Bird — markov decision process

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Security & Privacy

Authors

Ian Osband , Benjamin Van Roy

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Reinforcement Learning > Methods > Deep RL Machine Learning > Bayesian & Probabilistic > Bayesian Inference

Keywords

reinforcement learning posterior sampling markov decision process bayesian regret regret bound

Download PDF

Related papers

Bottleneck Conditional Density Estimation 2017

Constrained Policy Optimization 2017

Near-Optimal Design of Experiments via Regret Minimization 2017

Input Convex Neural Networks 2017

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation 2017