Conditional Importance Sampling for Off-Policy Learning

Mark Rowland; Anna Harutyunyan; Hado Hasselt; Diana Borsa; Tom Schaul; Rémi Munos; Will Dabney

2020 AISTATS AISTATS 2020

Conditional Importance Sampling for Off-Policy Learning

Abstract

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mark Rowland , Anna Harutyunyan , Hado Hasselt , Diana Borsa , Tom Schaul , Rémi Munos , Will Dabney

Topics

Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

reinforcement learning policy gradient importance sampling bellman equation off-policy learning conditional expectation

Download PDF

Related papers

Stretching the Effectiveness of MLE from Accuracy to Bias for Pairwise Comparisons 2020

Fast and Accurate Ranking Regression 2020

Nonparametric Sequential Prediction While Deep Learning the Kernel 2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation 2020

Unconditional Coresets for Regularized Loss Minimization 2020