A finite-sample analysis of multi-step temporal difference estimates

Yaqi Duan; Martin J. Wainwright

2023 L4DC L4DC 2023

A finite-sample analysis of multi-step temporal difference estimates

Abstract

We consider the problem of estimating the value function of an infinite-horizon $\gamma$-discounted Markov reward process (MRP). We establish non-asymptotic guarantees for a general family of multi-step temporal difference (TD) estimates, including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the TD$(\lambda)$ family for $\lambda \in [0,1)$ as special cases. Our bounds capture the dependence of these estimates on both the variance as defined by Bellman fluctuations, and the bias arising from possible model mis-specification. Our results reveal that the variance component shows limited sensitivity to the choice of look-ahead defining the estimator itself, while increasing the look-ahead can reduce the bias term. This highlights the benefit of using a larger look-ahead: it reduces bias but need not increase the variance.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Yaqi Duan , Martin J. Wainwright

Topics

Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Deep RL

Keywords

temporal difference learning bellman equation value function estimation finite-sample analysis markov reward process

Download PDF

Related papers

Model-Based Reinforcement Learning for Cavity Filter Tuning 2023

Learning on Manifolds: Universal Approximations Properties using Geometric Controllability Conditions for Neural ODEs 2023

Hyperparameter Tuning of an Off-Policy Reinforcement Learning Algorithm for H∞ Tracking Control 2023

Policy Learning for Active Target Tracking over Continuous $SE(3)$ Trajectories 2023

Automated Reachability Analysis of Neural Network-Controlled Systems via Adaptive Polytopes 2023