Deeply-Debiased Off-Policy Interval Estimation

Chengchun Shi; Runzhe Wan; Victor Chernozhukov; Rui Song

2021 ICML ICML 2021

Deeply-Debiased Off-Policy Interval Estimation

Abstract

Off-policy evaluation learns a target policy’s value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy’s value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/ RunzheStat/D2OPE.

🧭 Keyword Pioneer — policy valuation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chengchun Shi , Runzhe Wan , Victor Chernozhukov , Rui Song

Topics

Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Learning Types > Offline RL

Keywords

causal inference off-policy evaluation value estimation confidence interval policy valuation

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021