Adversarial Policy Learning in Two-player Competitive Games

Wenbo Guo; Xian Wu; Sui Huang; Xinyu Xing

2021 ICML ICML 2021

Adversarial Policy Learning in Two-player Competitive Games

Abstract

In a two-player deep reinforcement learning task, recent work shows an attacker could learn an adversarial policy that triggers a target agent to perform poorly and even react in an undesired way. However, its efficacy heavily relies upon the zero-sum assumption made in the two-player game. In this work, we propose a new adversarial learning algorithm. It addresses the problem by resetting the optimization goal in the learning process and designing a new surrogate optimization function. Our experiments show that our method significantly improves adversarial agents’ exploitability compared with the state-of-art attack. Besides, we also discover that our method could augment an agent with the ability to abuse the target game’s unfairness. Finally, we show that agents adversarially re-trained against our adversarial agents could obtain stronger adversary-resistance.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — adversarial policy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Wenbo Guo , Xian Wu , Sui Huang , Xinyu Xing

Topics

Artificial Intelligence > Core AI > Game AI Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Learning Types > Adversarial Learning Reinforcement Learning > Applications > Game AI Artificial Intelligence > Core AI > Adversarial Learning

Keywords

deep reinforcement learning reinforcement learning adversarial learning policy optimization policy gradient adversarial training competitive game two-player game adversarial policy surrogate optimization

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021