Regret Minimization for Partially Observable Deep Reinforcement Learning

Peter Jin; Kurt Keutzer; Sergey Levine

2018 ICML ICML 2018

Regret Minimization for Partially Observable Deep Reinforcement Learning

Abstract

Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🧭 Keyword Pioneer — first-person navigation

🐣 Hot Topic Early Bird — value function

Authors

Peter Jin , Kurt Keutzer , Sergey Levine

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reasoning

Keywords

deep reinforcement learning value function partially observable markov decision process counterfactual regret minimization partial observability counterfactual regret advantage function first-person navigation

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018