Online Estimation and Control with Optimal Pathlength Regret

Gautam Goel; Babak Hassibi

2022 L4DC L4DC 2022

Online Estimation and Control with Optimal Pathlength Regret

Abstract

A natural goal when designing online learning algorithms for non-stationary environments is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Intuitively, when the variation is small, it should be easier for the algorithm to achieve low regret, since past observations are predictive of future inputs. Such data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including online convex optimization (OCO) and bandits. We obtain the first pathlength regret bounds for online control and estimation (e.g. Kalman filtering) in linear dynamical systems. The key idea in our derivation is to reduce pathlength-optimal filtering and control to certain variational problems in robust estimation and control; these reductions may be of independent interest. Numerical simulations confirm that our pathlength-optimal algorithms outperform traditional H-2 and H-infinity algorithms when the environment varies over time.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — pathlength regret

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Gautam Goel , Babak Hassibi

Topics

Machine Learning > Optimization & Theory > Stochastic Processes Mathematics & Optimization > Optimization > Online Algorithms

Keywords

optimal control kalman filter state estimation online convex optimization linear dynamical system pathlength regret

Download PDF

Related papers

Learning-Enabled Robust Control with Noisy Measurements 2022

Input-to-State Stable Neural Ordinary Differential Equations with Applications to Transient Modeling of Circuits 2022

Data-Driven Controller Synthesis of Unknown Nonlinear Polynomial Systems via Control Barrier Certificates 2022

Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks 2022

On the Effectiveness of Iterative Learning Control 2022