Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently

Asaf Cassel; Alon Cohen; Tomer Koren

2020 ICML ICML 2020

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently

Abstract

We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown. Recent results in this setting have demonstrated efficient learning algorithms with regret growing with the square root of the number of decision steps. We present new efficient algorithms that achieve, perhaps surprisingly,regret that scales only (poly-)logarithmically with the number of steps, in two scenarios: when only the state transition matrix A is unknown, and when only the state-action transition matrix B is unknown and the optimal policy satisfies a certain non-degeneracy condition. On the other hand, we give a lower bound which shows that when the latter condition is violated, square root regret is unavoidable.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Reinforcement Learning

🧭 Keyword Pioneer — state transition matrix

🐣 Hot Topic Early Bird — optimal policy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Asaf Cassel , Alon Cohen , Tomer Koren

Topics

Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL Mathematics & Optimization > Optimization > Stochastic Methods

Keywords

optimal policy linear quadratic regulator regret bound state transition matrix non-degeneracy condition

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020