Approximate Midpoint Policy Iteration for Linear Quadratic Control

Benjamin Gravell; Iman Shames; Tyler Summers

2021 L4DC L4DC 2021

Approximate Midpoint Policy Iteration for Linear Quadratic Control

Abstract

We present a midpoint policy iteration algorithm to solve linear quadratic optimal control problems in both model-based and model-free settings. The algorithm is a variation of Newton’s method, and we show that in the model-based setting it achieves cubic convergence, which is superior to standard policy iteration and policy gradient algorithms that achieve quadratic and linear convergence, respectively. We also demonstrate that the algorithm can be approximately implemented without knowledge of the dynamics model by using least-squares estimates of the state-action value function from trajectory data, from which policy improvements can be obtained. With sufficient trajectory data, the policy iterates converge cubically to approximately optimal policies, and this occurs with the same available sample budget as the approximate standard policy iteration. Numerical experiments demonstrate effectiveness of the proposed algorithms.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Benjamin Gravell , Iman Shames , Tyler Summers

Topics

Artificial Intelligence > Core AI > Planning Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Optimization

Keywords

reinforcement learning convergence analysis optimal control policy iteration linear quadratic control

Download PDF

Related papers

Abstraction-based branch and bound approach to Q-learning for hybrid optimal control 2021

Data-driven design of switching reference governors for brake-by-wire applications 2021

Learning local modules in dynamic networks 2021

Certainty Equivalent Perception-Based Control 2021

Sample Complexity of Linear Quadratic Gaussian (LQG) Control for Output Feedback Systems 2021