Fast Stochastic Kalman Gradient Descent for Reinforcement Learning

Simone Totaro; Anders Jonsson

2021 L4DC L4DC 2021

Fast Stochastic Kalman Gradient Descent for Reinforcement Learning

Abstract

As we move towards real world applications, there is an increasing need for scalable, online optimization algorithms capable of dealing with the non-stationarity of the real world. We revisit the problem of online policy evaluation in non-stationary deterministic MDPs through the lense of Kalman filtering. We introduce a randomized regularization technique called Stochastic Kalman Gradient Descent (SKGD) that, combined with a low rank update, generates a sequence of feasible iterates. SKGD is suitable for large scale optimization of non-linear function approximators. We evaluate the performance of SKGD in two controlled experiments, and in one real world application of microgrid control. In our experiments, SKGD is more robust to drift in the transition dynamics than state-of-the-art reinforcement learning algorithms, and the resulting policies are smoother.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — online policy evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Simone Totaro , Anders Jonsson

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL

Keywords

stochastic gradient descent policy optimization kalman filtering non-stationary mdp online policy evaluation microgrid control

Download PDF

Related papers

Abstraction-based branch and bound approach to Q-learning for hybrid optimal control 2021

Data-driven design of switching reference governors for brake-by-wire applications 2021

Learning local modules in dynamic networks 2021

Certainty Equivalent Perception-Based Control 2021

Sample Complexity of Linear Quadratic Gaussian (LQG) Control for Output Feedback Systems 2021