Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game

Peixi Peng; Junliang Xing; Lili Cao; Lisen Mu; Chang Huang

2019 IJCAI IJCAI 2019

Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game

Abstract

The task of real-time combat game is to coordinate multiple units to defeat their enemies controlled by the given opponent in a real-time combat scenario. It is difficult to design a high-level Artificial Intelligence (AI) program for such a task due to its extremely large state-action space and real-time requirements. This paper formulates this task as a collective decentralized partially observable Markov decision process, and designs a Deep Decentralized Policy Network (DDPN) to model the polices. To train DDPN effectively, a novel two-stage learning algorithm is proposed which combines imitation learning from opponent and reinforcement learning by no-regret dynamics. Extensive experimental results on various combat scenarios indicate that proposed method can defeat different opponent models and significantly outperforms many state-of-the-art approaches.

🧭 Keyword Pioneer — deep decentralized policy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — partially observable markov decision process

Authors

Peixi Peng , Junliang Xing , Lili Cao , Lisen Mu , Chang Huang

Topics

Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Methods > Multi-Agent Systems Reinforcement Learning > Applications > Game AI

Keywords

imitation learning partially observable markov decision process deep decentralized policy collective reward no-regret dynamic real-time combat

Download PDF

Related papers

Causal Embeddings for Recommendation: An Extended Abstract 2019

Pivotal Relationship Identification: The K-Truss Minimization Problem 2019

Portioning Using Ordinal Preferences: Fairness and Efficiency 2019

Probabilistic Strategy Logic 2019

Multi-Agent Pathfinding with Continuous Time 2019