Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding

Chenhao Zhou; Zebang Shen; Chao Zhang; Hanbin Zhao; Hui Qian

2024 NIPS NeurIPS 2024

Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding

Abstract

In this paper, we propose a provably efficient natural policy gradient algorithm called Spectral Dynamic Embedding Policy Optimization (\SDEPO) for two-player zero-sum stochastic Markov games with continuous state space and finite action space. In the policy evaluation procedure of our algorithm, a novel kernel embedding method is employed to construct a finite-dimensional linear approximations to the state-action value function. We explicitly analyze the approximation error in policy evaluation, and show that \SDEPO\ achieves an $\tilde{O}(\frac{1}{(1-\gamma)^3\epsilon})$ last-iterate convergence to the $\epsilon-$optimal Nash equilibrium, which is independent of the cardinality of the state space. The complexity result matches the best-known results for global convergence of policy gradient algorithms for single agent setting. Moreover, we also propose a practical variant of \SDEPO\ to deal with continuous action space and empirical results demonstrate the practical superiority of the proposed method.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization and Reinforcement Learning

🧭 Keyword Pioneer — spectral dynamic embedding

Authors

Chenhao Zhou , Zebang Shen , Chao Zhang , Hanbin Zhao , Hui Qian

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Methods > Multi-Agent Systems Machine Learning > Learning Types > Multi-Agent Systems Mathematics & Optimization > Optimization > Game Theory Artificial Intelligence > Core AI > Game Theory Reinforcement Learning > Applications > Multi-Agent Systems

Keywords

policy optimization policy gradient natural policy gradient nash equilibrium continuous state continuous state space kernel embedding zero-sum game spectral method markov game zero-sum markov game spectral dynamic embedding

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024