Learning in Zero-Sum Markov Games: Relaxing Strong Reachability and Mixing Time Assumptions

Reda Ouhamma; Maryam Kamgarpour

2026 AAAI AAAI 2026

Learning in Zero-Sum Markov Games: Relaxing Strong Reachability and Mixing Time Assumptions

Abstract

Abstract We address payoff-based decentralized learning in infinite-horizon zero-sum Markov games. In this setting, each player makes decisions based solely on received rewards, without observing the opponent's strategy or actions, nor sharing information. Prior works established polynomial-time convergence to an approximate Nash equilibrium under strong reachability and mixing time assumptions. We propose a convergent algorithm that significantly relaxes these assumptions, requiring only the existence of a single policy with bounded reachability and mixing time. Our key algorithmic novelty is introducing Tsallis entropy regularization to smooth the best-response policy updates. By suitably tuning this regularization, we ensure sufficient exploration, thus bypassing previous stringent assumptions on the MDP. We prove a polynomial-time convergence to an approximate Nash equilibrium by establishing novel properties of the value and policy updates induced by the Tsallis entropy regularizer.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Reda Ouhamma , Maryam Kamgarpour

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Stochastic Processes

Keywords

nash equilibrium zero-sum game entropy regularization markov game tsallis entropy

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026