Counterfactual Data-Fusion for Online Reinforcement Learners

Andrew Forney; Judea Pearl; Elias Bareinboim

2017 ICML ICML 2017

Counterfactual Data-Fusion for Online Reinforcement Learners

Abstract

The Multi-Armed Bandit problem with Unobserved Confounders (MABUC) considers decision-making settings where unmeasured variables can influence both the agent’s decisions and received rewards (Bareinboim et al., 2015). Recent findings showed that unobserved confounders (UCs) pose a unique challenge to algorithms based on standard randomization (i.e., experimental data); if UCs are naively averaged out, these algorithms behave sub-optimally, possibly incurring infinite regret. In this paper, we show how counterfactual-based decision-making circumvents these problems and leads to a coherent fusion of observational and experimental data. We then demonstrate this new strategy in an enhanced Thompson Sampling bandit player, and support our findings’ efficacy with extensive simulations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

📈 Trend Setter — Reinforcement Learning

🧭 Keyword Pioneer — unobserved confounder

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

Authors

Andrew Forney , Judea Pearl , Elias Bareinboim

Topics

Artificial Intelligence > Core AI > Causal Inference Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

thompson sampling multi-armed bandit counterfactual reasoning unobserved confounder observational datum

Download PDF

Related papers

Bottleneck Conditional Density Estimation 2017

Constrained Policy Optimization 2017

Near-Optimal Design of Experiments via Regret Minimization 2017

Input Convex Neural Networks 2017

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation 2017