MFVFD: A Multi-Agent Q-Learning Approach to Cooperative and Non-Cooperative Tasks

Tianhao Zhang; Qiwei Ye; Jiang Bian; Guangming Xie; Tie-yan Liu

2021 IJCAI IJCAI 2021

MFVFD: A Multi-Agent Q-Learning Approach to Cooperative and Non-Cooperative Tasks

Abstract

Value function decomposition (VFD) methods under the popular paradigm of centralized training and decentralized execution (CTDE) have promoted multi-agent reinforcement learning progress. However, existing VFD methods proceed from a group's value function decomposition to only solve cooperative tasks. With the individual value function decomposition, we propose MFVFD, a novel multi-agent Q-learning approach for solving cooperative and non-cooperative tasks based on mean-field theory. Our analysis on the Hawk-Dove and Nonmonotonic Cooperation matrix games evaluate MFVFD's convergent solution. Empirical studies on the challenging mixed cooperative-competitive tasks where hundreds of agents coexist demonstrate that MFVFD significantly outperforms existing baselines.

🧭 Keyword Pioneer — value function decomposition

🐝 Cross-Pollinator — Deep Learning, Machine Learning, Reinforcement Learning

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization and Reinforcement Learning

Authors

Tianhao Zhang , Qiwei Ye , Jiang Bian , Guangming Xie , Tie-yan Liu

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Reinforcement Learning > Methods > Multi-Agent Systems Reinforcement Learning > Applications > Game AI Machine Learning > Learning Types > Reinforcement Learning Mathematics & Optimization > Optimization > Game Theory

Keywords

multi-agent reinforcement learning game theory centralized training decentralized execution value function decomposition mean-field theory multi-agent q-learning hawk-dove game non-cooperative task

Download PDF

Related papers

Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard 2021

Guaranteeing Maximin Shares: Some Agents Left Behind 2021

Surprisingly Popular Voting Recovers Rankings, Surprisingly! 2021

Strategyproof Randomized Social Choice for Restricted Sets of Utility Functions 2021

Diversity in Kemeny Rank Aggregation: A Parameterized Approach 2021