Generative Adversarial Imitation Learning

Jonathan Ho; Stefano Ermon

2016 NIPS NeurIPS 2016

Generative Adversarial Imitation Learning

Abstract

Consider learning a policy from example expert behavior, without interaction with the expert or access to a reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — expert behavior

🐣 Hot Topic Early Bird — imitation learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jonathan Ho , Stefano Ermon

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Self-Supervised Learning Reinforcement Learning > Methods > Deep RL

Keywords

imitation learning policy learning inverse reinforcement learning generative adversarial network policy extraction expert behavior

Download PDF

Related papers

Bayesian Intermittent Demand Forecasting for Large Inventories 2016

Dynamic Network Surgery for Efficient DNNs 2016

Beyond Exchangeability: The Chinese Voting Process 2016

Safe and Efficient Off-Policy Reinforcement Learning 2016

Tagger: Deep Unsupervised Perceptual Grouping 2016