Multi-Step Dyna Planning for Policy Evaluation and Control

Hengshuai Yao; Shalabh Bhatnagar; Dongcui Diao; Richard S. Sutton; Csaba Szepesvári

2009 NIPS NeurIPS 2009

Multi-Step Dyna Planning for Policy Evaluation and Control

Abstract

We extend Dyna planning architecture for policy evaluation and control in two significant aspects. First, we introduce a multi-step Dyna planning that projects the simulated state/feature many steps into the future. Our multi-step Dyna is based on a multi-step model, which we call the {\em $\lambda$-model}. The $\lambda$-model interpolates between the one-step model and an infinite-step model, and can be learned efficiently online. Second, we use for Dyna control a dynamic multi-step model that is able to predict the results of a sequence of greedy actions and track the optimal policy in the long run. Experimental results show that Dyna using the multi-step model evaluates a policy faster than using single-step models; Dyna control algorithms using the dynamic tracking model are much faster than model-free algorithms; further, multi-step Dyna control algorithms enable the policy and value function to converge much faster to their optima than single-step Dyna algorithms.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🧭 Keyword Pioneer — multi-step planning

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — policy evaluation

Authors

Hengshuai Yao , Shalabh Bhatnagar , Dongcui Diao , Richard S. Sutton , Csaba Szepesvári

Topics

Artificial Intelligence > Core AI > Planning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Model-Based RL

Keywords

policy evaluation value function model-based reinforcement learning multi-step planning dyna architecture lambda model multi-step learning dyna planning

Download PDF

Related papers

Solving Stochastic Games 2009

Bilinear classifiers for visual recognition 2009

Zero-shot Learning with Semantic Output Codes 2009

Matrix Completion from Power-Law Distributed Samples 2009

Heavy-Tailed Symmetric Stochastic Neighbor Embedding 2009