Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Yulai Zhao; Yuandong Tian; Jason Lee; Simon Du

2022 AISTATS AISTATS 2022

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Abstract

Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games.

🌉 Interdisciplinary Bridge — Mathematics & Optimization and Reinforcement Learning

🐣 Hot Topic Early Bird — markov game

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yulai Zhao , Yuandong Tian , Jason Lee , Simon Du

Topics

Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Methods > Multi-Agent Systems Reinforcement Learning > Applications > Game AI Mathematics & Optimization > Optimization > Game Theory Reinforcement Learning > Applications > Multi-Agent Systems

Keywords

policy optimization game theory function approximation zero-sum game markov game multi-agent system

Download PDF

Related papers

Exploring Image Regions Not Well Encoded by an INN 2022

On Linear Model with Markov Signal Priors 2022

Probabilistic Numerical Method of Lines for Time-Dependent Partial Differential Equations 2022

On Distributionally Robust Optimization and Data Rebalancing 2022

Common Failure Modes of Subcluster-based Sampling in Dirichlet Process Gaussian Mixture Models - and a Deep-learning Solution 2022