Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Christoph Dann; Tor Lattimore; Emma Brunskill

2017 NIPS NeurIPS 2017

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Abstract

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — uniform pac bound

🐣 Hot Topic Early Bird — pac learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Christoph Dann , Tor Lattimore , Emma Brunskill

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Deep RL

Keywords

pac learning markov decision process regret bound episodic reinforcement learning uniform pac bound

Download PDF

Related papers

High-Order Attention Models for Visual Question Answering 2017

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization 2017

Premise Selection for Theorem Proving by Deep Graph Embedding 2017

Neural Program Meta-Induction 2017

Safe and Nested Subgame Solving for Imperfect-Information Games 2017