Generalized TD Learning

Tsuyoshi Ueno; Shin-ichi Maeda; Motoaki Kawanabe; Shin Ishii

2011 JMLR JMLR 2011

Generalized TD Learning

Abstract

Since the invention of temporal difference (TD) learning (Sutton, 1988), many new algorithms for model-free policy evaluation have been proposed. Although they have brought much progress in practical applications of reinforcement learning (RL), there still remain fundamental problems concerning statistical properties of the value function estimation. To solve these problems, we introduce a new framework, semiparametric statistical inference, to model-free policy evaluation. This framework generalizes TD learning and its extensions, and allows us to investigate statistical properties of both of batch and online learning procedures for the value function estimation in a unified way in terms of estimating functions. Furthermore, based on this framework, we derive an optimal estimating function with the minimum asymptotic variance and propose batch and online learning algorithms which achieve the optimality. [abs] [ pdf ][ bib ] © JMLR 2011. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tsuyoshi Ueno , Shin-ichi Maeda , Motoaki Kawanabe , Shin Ishii

Topics

Machine Learning > Optimization & Theory > Statistical Learning Reinforcement Learning > Methods > Deep RL

Keywords

reinforcement learning temporal difference learning policy evaluation value function semiparametric inference

Download PDF

Related papers

MSVMpack: A Multi-Class Support Vector Machine Package 2011

Multitask Sparsity via Maximum Entropy Discrimination 2011

Training SVMs Without Offset 2011

Logistic Stick-Breaking Process 2011

Learning Multi-modal Similarity 2011