Counterfactual Learning with General Data-Generating Policies

Yusuke Narita; Kyohei Okumura; Akihiro Shimizu; Kohei Yata

2023 AAAI AAAI 2023

Counterfactual Learning with General Data-Generating Policies

Abstract

Abstract Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log data from a different policy. We extend its applicability by developing an OPE method for a class of both full support and deficient support logging policies in contextual-bandit settings. This class includes deterministic bandit (such as Upper Confidence Bound) as well as deterministic decision-making based on supervised and unsupervised learning. We prove that our method's prediction converges in probability to the true performance of a counterfactual policy as the sample size increases. We validate our method with experiments on partly and entirely deterministic logging policies. Finally, we apply it to evaluate coupon targeting policies by a major online platform and show how to improve the existing policy.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — counterfactual policy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Yusuke Narita , Kyohei Okumura , Akihiro Shimizu , Kohei Yata

Topics

Artificial Intelligence > Core AI > Causal Inference Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Offline RL

Keywords

off-policy evaluation policy optimization policy improvement contextual bandit counterfactual learning counterfactual policy logging policy convergence in probability

Download PDF

Related papers

A Model-Agnostic Heuristics for Selective Classification 2023

Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract) 2023

Head-Free Lightweight Semantic Segmentation with Linear Transformer 2023

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning 2023

Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse 2023