End-to-End Differentiable Adversarial Imitation Learning

Nir Baram; Oron Anschel; Itai Caspi; Shie Mannor

2017 ICML ICML 2017

End-to-End Differentiable Adversarial Imitation Learning

Abstract

Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup. However, the computation graph of GANs, that include a stochastic policy as the generative model, is no longer differentiable end-to-end, which requires the use of high-variance gradient estimation. In this paper, we introduce the Model-based Generative Adversarial Imitation Learning (MGAIL) algorithm. We show how to use a forward model to make the computation fully differentiable, which enables training policies using the exact gradient of the discriminator. The resulting algorithm trains competent policies using relatively fewer expert samples and interactions with the environment. We test it on both discrete and continuous action domains and report results that surpass the state-of-the-art.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — adversarial imitation learning

🐣 Hot Topic Early Bird — generative adversarial network

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Nir Baram , Oron Anschel , Itai Caspi , Shie Mannor

Topics

Machine Learning > Learning Types > Adversarial Learning Reinforcement Learning > Methods > Policy Learning

Keywords

policy gradient gradient estimation model-based reinforcement learning differentiable programming generative adversarial network adversarial imitation learning

Download PDF

Related papers

Bottleneck Conditional Density Estimation 2017

Constrained Policy Optimization 2017

Near-Optimal Design of Experiments via Regret Minimization 2017

Input Convex Neural Networks 2017

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation 2017