Thompson Sampling Achieves $\tilde{O}(\sqrt{T})$ Regret in Linear Quadratic Control

Taylan Kargin; Sahin Lale; Kamyar Azizzadenesheli; Animashree Anandkumar; Babak Hassibi

2022 COLT COLT 2022

Thompson Sampling Achieves $\tilde{O}(\sqrt{T})$ Regret in Linear Quadratic Control

Abstract

Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that $\tilde{O}(\sqrt{T})$ frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains $\tilde{O}(\sqrt{T})$ regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018). TSAC does not require a priori known stabilizing controller and achieves fast stabilization of the underlying system by effectively exploring the environment in the early stages. Our result hinges on developing a novel lower bound on the probability that the TS provides an optimistic sample. By carefully prescribing an early exploration strategy and a policy update rule, we show that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs. We empirically demonstrate the performance and the efficiency of the proposed algorithm in several adaptive control tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — stabilizable lqr

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Taylan Kargin , Sahin Lale , Kamyar Azizzadenesheli , Animashree Anandkumar , Babak Hassibi

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Optimization & Theory > Online Algorithms

Keywords

thompson sampling adaptive control regret bound linear quadratic control stabilizable lqr

Download PDF

Related papers

Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares 2022

Analysis of Langevin Monte Carlo from Poincare to Log-Sobolev 2022

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance 2022

Tight query complexity bounds for learning graph partitions 2022

Pushing the Efficiency-Regret Pareto Frontier for Online Learning of Portfolios and Quantum States 2022