Bayesian Counterfactual Risk Minimization

Ben London; Ted Sandler

2019 ICML ICML 2019

Bayesian Counterfactual Risk Minimization

Abstract

We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐣 Hot Topic Early Bird — off-policy evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ben London , Ted Sandler

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Statistical Learning

Keywords

off-policy evaluation bayesian inference pac-bayesian analysis bandit feedback propensity score counterfactual risk minimization

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019