Minimax sample complexity for turn-based stochastic game

Qiwen Cui; Lin F. Yang

2021 UAI UAI 2021

Minimax sample complexity for turn-based stochastic game

Abstract

The empirical success of multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we perform planning in an empirical TBSG by utilizing a ‘simulator’ that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop reward perturbation techniques to tackle the non-stationarity in the game and Taylor-expansion-type analysis to improve the dependence on approximation error. With these novel techniques, we prove the minimax sample complexity of turn-based stochastic game.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — turn-based stochastic game

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qiwen Cui , Lin F. Yang

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Multi-Agent Systems

Keywords

multi-agent reinforcement learning sample complexity nash equilibrium minimax bounds reward perturbation turn-based stochastic game

Download PDF

Related papers

Efficient greedy coordinate descent via variable partitioning 2021

Multi-output Gaussian Processes for uncertainty-aware recommender systems 2021

Constrained differentially private federated learning for low-bandwidth devices 2021

Matrix games with bandit feedback 2021

A weaker faithfulness assumption based on triple interactions 2021