TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning

George Konidaris; Scott Niekum; Philip S. Thomas

2011 NIPS NeurIPS 2011

TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning

Abstract

We show that the lambda-return target used in the TD(lambda) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an n-step return estimate increases with n. We introduce the gamma-return estimator, an alternative target based on a more accurate model of variance, which defines the TDgamma family of complex-backup temporal difference learning algorithms. We derive TDgamma, the gamma-return equivalent of the original TD(lambda) algorithm, which eliminates the lambda parameter but can only perform updates at the end of an episode and requires time and space proportional to the episode length. We then derive a second algorithm, TDgamma(C), with a capacity parameter C. TDgamma(C) requires C times more time and memory than TD(lambda) and is incremental and online. We show that TDgamma outperforms TD(lambda) for any setting of lambda on 4 out of 5 benchmark domains, and that TDgamma(C) performs as well as or better than TD_gamma for intermediate settings of C.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

📈 Trend Setter — Deep Learning

Authors

George Konidaris , Scott Niekum , Philip S. Thomas

Topics

Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Stochastic Processes Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Deep Learning

Keywords

stochastic processes reinforcement learning temporal difference learning neural network optimization value function variance estimation value estimation stochastic method

Download PDF

Related papers

Co-Training for Domain Adaptation 2011

The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning 2011

Learning to Agglomerate Superpixel Hierarchies 2011

A Reinforcement Learning Theory for Homeostatic Regulation 2011

A Global Structural EM Algorithm for a Model of Cancer Progression 2011