Multi-Agent Adversarial Inverse Reinforcement Learning

Lantao Yu; Jiaming Song; Stefano Ermon

2019 ICML ICML 2019

Multi-Agent Adversarial Inverse Reinforcement Learning

Abstract

Reinforcement learning agents are prone to undesired behaviors due to reward mis-specification. Finding a set of reward functions to properly guide agent behaviors is particularly challenging in multi-agent scenarios. Inverse reinforcement learning provides a framework to automatically acquire suitable reward functions from expert demonstrations. Its extension to multi-agent settings, however, is difficult due to the more complex notions of rational behaviors. In this paper, we propose MA-AIRL, a new framework for multi-agent inverse reinforcement learning, which is effective and scalable for Markov games with high-dimensional state-action space and unknown dynamics. We derive our algorithm based on a new solution concept and maximum pseudolikelihood estimation within an adversarial reward learning framework. In the experiments, we demonstrate that MA-AIRL can recover reward functions that are highly correlated with the ground truth rewards, while significantly outperforms prior methods in terms of policy imitation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — policy imitation

🐣 Hot Topic Early Bird — multi-agent system

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Lantao Yu , Jiaming Song , Stefano Ermon

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Learning Types > Adversarial Learning Reinforcement Learning > Methods > Multi-Agent Systems

Keywords

adversarial learning inverse reinforcement learning reward function markov game policy imitation multi-agent system

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019