Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Philip Thomas; Emma Brunskill

2016 ICML ICML 2016

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Abstract

In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The ability to evaluate a policy from historical data is important for applications where the deployment of a bad policy can be dangerous or costly. We show empirically that our algorithm produces estimates that often have orders of magnitude lower mean squared error than existing methods—it makes more efficient use of the available data. Our new estimator is based on two advances: an extension of the doubly robust estimator (Jiang & Li, 2015), and a new way to mix between model based and importance sampling based estimates.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

📈 Trend Setter — Offline RL

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🧭 Keyword Pioneer — model-based estimation

🐣 Hot Topic Early Bird — reinforcement learning

Authors

Philip Thomas , Emma Brunskill

Topics

Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Optimization & Theory > Statistical Learning Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning off-policy evaluation importance sampling model-based reinforcement learning value function approximation doubly robust estimator model-based estimation

Download PDF

Related papers

Associative Long Short-Term Memory 2016

Recycling Randomness with Structure for Sublinear time Kernel Expansions 2016

Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues 2016

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization 2016

Hawkes Processes with Stochastic Excitations 2016