2023 AISTATS AISTATS 2023

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Abstract

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm, when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal O (1/t) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation, and show that this variant fares favourably in problems with ill-conditioned features.

🧭 Keyword Pioneer — tail averaging
🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics
🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning