Provably Sample Efficient Reinforcement Learning in Competitive Linear Quadratic Systems

Jingwei Zhang; Zhuoran Yang; Zhengyuan Zhou; Zhaoran Wang

2021 L4DC L4DC 2021

Provably Sample Efficient Reinforcement Learning in Competitive Linear Quadratic Systems

Abstract

We study the infinite-horizon zero-sum linear quadratic (LQ) games, where the state transition is linear and the cost function is quadratic in states and actions of two players. In particular, we develop an adaptive algorithm that can properly trade off between exploration and exploitation of the unknown environment in LQ games based on the optimism-in-face-of-uncertainty (OFU) principle. We show that (i) the average regret of player $1$ (the min player) can be bounded by $\widetilde{\mathcal{O}}(1/\sqrt{T})$ against any fixed linear policy of the adversary (player $2$); (ii) the average cost of player $1$ also converges to the value of the game at a sublinear $\widetilde{\mathcal{O}}(1/\sqrt{T})$ rate if the adversary plays adaptively against player $1$ with the same algorithm, i.e., with self-play. To the best of our knowledge, this is the first time that a probably sample efficient reinforcement learning algorithm is proposed for zero-sum LQ games.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — linear quadratic game

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Jingwei Zhang , Zhuoran Yang , Zhengyuan Zhou , Zhaoran Wang

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Multi-Agent Systems Mathematics & Optimization > Optimization > Game Theory

Keywords

reinforcement learning regret bound zero-sum game optimism in face of uncertainty linear quadratic game average regret bound

Download PDF

Related papers

Abstraction-based branch and bound approach to Q-learning for hybrid optimal control 2021

Data-driven design of switching reference governors for brake-by-wire applications 2021

Learning local modules in dynamic networks 2021

Certainty Equivalent Perception-Based Control 2021

Sample Complexity of Linear Quadratic Gaussian (LQG) Control for Output Feedback Systems 2021