Macro-Action-Based Deep Multi-Agent Reinforcement Learning

Yuchen Xiao; Joshua Hoffman; Christopher Amato

2019 CORL CoRL 2019

Macro-Action-Based Deep Multi-Agent Reinforcement Learning

Abstract

In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — asynchronous decision making

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuchen Xiao , Joshua Hoffman , Christopher Amato

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Reinforcement Learning > Methods > Multi-Agent Systems Reinforcement Learning > Applications > Robotics Machine Learning > Learning Types > Multi-Agent Systems Reinforcement Learning > Applications > Multi-Agent Systems

Keywords

multi-agent reinforcement learning decentralized pomdp deep q-network multi-robot system decentralized partially observable markov decision process asynchronous decision making asynchronous decision

Download PDF

Related papers

On-Policy Robot Imitation Learning from a Converging Supervisor 2019

Learning by Cheating 2019

Object-centric Forward Modeling for Model Predictive Control 2019

Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real 2019

Combining Deep Learning and Verification for Precise Object Instance Detection 2019