MAVEN: Multi-Agent Variational Exploration

Anuj Mahajan; Tabish Rashid; Mikayel Samvelyan; Shimon Whiteson

2019 NIPS NeurIPS 2019

MAVEN: Multi-Agent Variational Exploration

Abstract

Centralised training with decentralised execution is an important setting for cooperative deep multi-agent reinforcement learning due to communication constraints during execution and computational tractability in training. In this paper, we analyse value-based methods that are known to have superior performance in complex environments. We specifically focus on QMIX, the current state-of-the-art in this domain. We show that the representation constraints on the joint action-values introduced by QMIX and similar methods lead to provably poor exploration and suboptimality. Furthermore, we propose a novel approach called MAVEN that hybridises value and policy-based methods by introducing a latent space for hierarchical control. The value-based agents condition their behaviour on the shared latent variable controlled by a hierarchical policy. This allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks. Our experimental results show that MAVEN achieves significant performance improvements on the challenging SMAC domain.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — centralized training

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Anuj Mahajan , Tabish Rashid , Mikayel Samvelyan , Shimon Whiteson

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Multi-Agent Systems Machine Learning > Learning Types > Multi-Agent Systems Reinforcement Learning > Applications > Multi-Agent Systems Machine Learning > Learning Types > Exploration

Keywords

multi-agent reinforcement learning value function temporal abstraction cooperative multi-agent cooperative game value-based method hierarchical control hierarchical policy centralized training temporal extension

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019