Model-Free Linear Quadratic Control via Reduction to Expert Prediction

Yasin Abbasi-Yadkori; Nevena Lazic; Csaba Szepesvári

2019 AISTATS AISTATS 2019

Model-Free Linear Quadratic Control via Reduction to Expert Prediction

Abstract

Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based RL. In this work, we present a new model-free algorithm for controlling linear quadratic (LQ) systems, and show that its regret scales as $O(T^{\xi+2/3})$ for any small $\xi>0$ if time horizon satisfies $T>C^{1/\xi}$ for a constant $C$. The algorithm is based on a reduction of control of Markov decision processes to an expert prediction problem. In practice, it corresponds to a variant of policy iteration with forced exploration, where the policy in each phase is greedy with respect to the average of all previous value functions. This is the first model-free algorithm for adaptive control of LQ systems that provably achieves sublinear regret and has a polynomial computation cost. Empirically, our algorithm dramatically outperforms standard policy iteration, but performs worse than a model-based approach.

🌉 Interdisciplinary Bridge — Mathematics & Optimization and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Yasin Abbasi-Yadkori , Nevena Lazic , Csaba Szepesvári

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics Mathematics & Optimization > Optimization > Optimal Control

Keywords

policy iteration sublinear regret model-free reinforcement learning linear quadratic control expert prediction

Download PDF

Related papers

Inferring Multidimensional Rates of Aging from Cross-Sectional Data 2019

On the Interaction Effects Between Prediction and Clustering 2019

Efficient Linear Bandits through Matrix Sketching 2019

An Optimal Algorithm for Stochastic Three-Composite Optimization 2019

Efficient Inference in Multi-task Cox Process Models 2019