Policy Evaluation Using the Ω-Return

Philip S. Thomas; Scott Niekum; Georgios Theocharous; George Konidaris

2015 NIPS NeurIPS 2015

Policy Evaluation Using the Ω-Return

Abstract

We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.

🧭 Keyword Pioneer — lambda return

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Philip S. Thomas , Scott Niekum , Georgios Theocharous , George Konidaris

Topics

Artificial Intelligence > Core AI > Agent Systems

Keywords

reinforcement learning policy evaluation temporal difference lambda return omega return

Download PDF

Related papers

Data Generation as Sequential Decision Making 2015

A Recurrent Latent Variable Model for Sequential Data 2015

Combinatorial Cascading Bandits 2015

Accelerated Mirror Descent in Continuous and Discrete Time 2015

Matrix Completion with Noisy Side Information 2015