On Oracle-Efficient PAC RL with Rich Observations

Christoph Dann; Nan Jiang; Akshay Krishnamurthy; Alekh Agarwal; John Langford; Robert E. Schapire

2018 NIPS NeurIPS 2018

On Oracle-Efficient PAC RL with Rich Observations

Abstract

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE, cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

📈 Trend Setter — Sample Complexity

🧭 Keyword Pioneer — oracle-efficient algorithm

🐣 Hot Topic Early Bird — pac learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Christoph Dann , Nan Jiang , Akshay Krishnamurthy , Alekh Agarwal , John Langford , Robert E. Schapire

Topics

Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Optimization & Theory > Sample Complexity

Keywords

reinforcement learning pac learning value iteration oracle-efficient algorithm rich observation sample-efficient algorithm oracle model hidden state dynamics pac reinforcement learning

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Recurrent World Models Facilitate Policy Evolution 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018