Model-Free and Model-Based Policy Evaluation when Causality is Uncertain

David A Bruns-Smith

2021 ICML ICML 2021

Model-Free and Model-Based Policy Evaluation when Causality is Uncertain

Abstract

When decision-makers can directly intervene, policy evaluation algorithms give valid causal estimates. In off-policy evaluation (OPE), there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior policy. These “confounders” will introduce spurious correlations and naive estimates for a new policy will be biased. We develop worst-case bounds to assess sensitivity to these unobserved confounders in finite horizons when confounders are drawn iid each period. We demonstrate that a model-based approach with robust MDPs gives sharper lower bounds by exploiting domain knowledge about the dynamics. Finally, we show that when unobserved confounders are persistent over time, OPE is far more difficult and existing techniques produce extremely conservative bounds.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🐣 Hot Topic Early Bird — causal inference

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

David A Bruns-Smith

Topics

Artificial Intelligence > Core AI > Causal Inference Reinforcement Learning > Methods > Offline RL Knowledge & Reasoning > Reasoning > Causal Inference Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Offline RL

Keywords

causal inference off-policy evaluation model-based reinforcement learning confounding variable robust mdp

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021