A Regularized Opponent Model with Maximum Entropy Objective

Zheng Tian; Ying Wen; Zhichen Gong; Faiz Punakkath; Shihao Zou; Jun Wang

2019 IJCAI IJCAI 2019

A Regularized Opponent Model with Maximum Entropy Objective

Abstract

In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐣 Hot Topic Early Bird — multi-agent reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zheng Tian , Ying Wen , Zhichen Gong , Faiz Punakkath , Shihao Zou , Jun Wang

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Reinforcement Learning > Methods > Multi-Agent Systems Reinforcement Learning > Applications > Game AI

Keywords

multi-agent reinforcement learning maximum entropy variational inference probabilistic inference cooperative game opponent modeling

Download PDF

Related papers

Causal Embeddings for Recommendation: An Extended Abstract 2019

Pivotal Relationship Identification: The K-Truss Minimization Problem 2019

Portioning Using Ordinal Preferences: Fairness and Efficiency 2019

Probabilistic Strategy Logic 2019

Multi-Agent Pathfinding with Continuous Time 2019