Single Timescale Actor-Critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

Mo Zhou; Jianfeng Lu

2023 JMLR JMLR 2023

Single Timescale Actor-Critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

Abstract

We propose a single timescale actor-critic algorithm to solve the linear quadratic regulator (LQR) problem. A least squares temporal difference (LSTD) method is applied to the critic and a natural policy gradient method is used for the actor. We give a proof of convergence with sample complexity $\mathcal{O}(\varepsilon^{-1} \log(\varepsilon^{-1})^2)$. The method in the proof is applicable to general single timescale bilevel optimization problems. We also numerically validate our theoretical results on the convergence. [abs] [ pdf ][ bib ] [ code ] © JMLR 2023. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mo Zhou , Jianfeng Lu

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics Mathematics & Optimization > Optimization > Optimal Control

Keywords

temporal difference learning policy gradient natural policy gradient linear quadratic regulator convergence guarantee actor-critic method

Download PDF

Related papers

Flexible Model Aggregation for Quantile Regression 2023

Efficient Computation of Rankings from Pairwise Comparisons 2023

Efficient Structure-preserving Support Tensor Train Machine 2023

Attacks against Federated Learning Defense Systems and their Mitigation 2023

How Do You Want Your Greedy: Simultaneous or Repeated? 2023