Deep Conservative Policy Iteration

Nino Vieillard; Olivier Pietquin; Matthieu Geist

2020 AAAI AAAI 2020

Deep Conservative Policy Iteration

Abstract

Abstract Conservative Policy Iteration (CPI) is a founding algorithm of Approximate Dynamic Programming (ADP). Its core principle is to stabilize greediness through stochastic mixtures of consecutive policies. It comes with strong theoretical guarantees, and inspired approaches in deep Reinforcement Learning (RL). However, CPI itself has rarely been implemented, never with neural networks, and only experimented on toy problems. In this paper, we show how CPI can be practically combined with deep RL with discrete actions, in an off-policy manner. We also introduce adaptive mixture rates inspired by the theory. We experiment thoroughly the resulting algorithm on the simple Cartpole problem, and validate the proposed method on a representative subset of Atari games. Overall, this work suggests that revisiting classic ADP may lead to improved and more stable deep RL algorithms.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — stochastic mixture

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nino Vieillard , Olivier Pietquin , Matthieu Geist

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Deep Learning > Optimization & Theory > Optimization Deep Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

deep reinforcement learning policy iteration approximate dynamic programming off-policy learning conservative policy iteration neural network stochastic mixture

Download PDF

Related papers

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions 2020

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2020

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention 2020

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy 2020

Multi-Point Semantic Representation for Intent Classification 2020