Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Gandharv Patil; Prashanth L.A.; Dheeraj Nagaraj; Doina Precup

2023 AISTATS AISTATS 2023

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Abstract

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm, when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal O (1/t) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation, and show that this variant fares favourably in problems with ill-conditioned features.

🧭 Keyword Pioneer — tail averaging

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

Authors

Gandharv Patil , Prashanth L.A. , Dheeraj Nagaraj , Doina Precup

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Optimization & Theory > Stochastic Processes Reinforcement Learning > Methods > Deep RL

Keywords

temporal difference learning linear function approximation finite-time analysis tail averaging finite-time bound projected fixed point

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023