Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Weinan Zhang; Xihuai Wang; Jian Shen; Ming Zhou

2021 IJCAI IJCAI 2021

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Abstract

This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning process, we propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent environment model, consisting of a dynamics model and multiple opponent models, and trains its policy with the adaptive opponent-wise rollout. We further prove the theoretic convergence of AORPO under reasonable assumptions. Empirical experiments on competitive and cooperative tasks demonstrate that AORPO can achieve improved sample efficiency with comparable asymptotic performance over the compared MARL methods.

🐣 Hot Topic Early Bird — multi-agent reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

Authors

Weinan Zhang , Xihuai Wang , Jian Shen , Ming Zhou

Topics

Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Methods > Multi-Agent Systems Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Multi-Agent Systems Reinforcement Learning > Applications > Multi-Agent Systems

Keywords

multi-agent reinforcement learning sample efficiency policy optimization model-based reinforcement learning opponent modeling

Download PDF

Related papers

Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard 2021

Guaranteeing Maximin Shares: Some Agents Left Behind 2021

Surprisingly Popular Voting Recovers Rankings, Surprisingly! 2021

Strategyproof Randomized Social Choice for Restricted Sets of Utility Functions 2021

Diversity in Kemeny Rank Aggregation: A Parameterized Approach 2021