Cost-Effective Incentive Allocation via Structured Counterfactual Inference

Romain Lopez; Chenchen Li; Xiang Yan; Junwu Xiong; Michael Jordan; Yuan Qi; Le Song

2020 AAAI AAAI 2020

Cost-Effective Incentive Allocation via Structured Counterfactual Inference

Abstract

Abstract We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tries to learn a policy for allocating strategic financial incentives to customers and observes only bandit feedback. In contrast to traditional policy optimization frameworks, we take into account the additional reward structure and budget constraints common in this setting, and develop a new two-step method for solving this constrained counterfactual policy optimization problem. Our method first casts the reward estimation problem as a domain adaptation problem with supplementary structure, and then subsequently uses the estimators for optimizing the policy with constraints. We also establish theoretical error bounds for our estimation procedure and we empirically show that the approach leads to significant improvement on both synthetic and real datasets.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — counterfactual policy optimization

🐣 Hot Topic Early Bird — bandit feedback

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Romain Lopez , Chenchen Li , Xiang Yan , Junwu Xiong , Michael Jordan , Yuan Qi , Le Song

Topics

Artificial Intelligence > Core AI > Causal Inference Machine Learning > Learning Types > Semi-Supervised Learning Machine Learning > Application Areas > Domain Adaptation Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Domain Adaptation Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Learning Types > Causal Inference

Keywords

causal inference policy optimization domain adaptation policy learning bandit feedback counterfactual inference budget constraint reward estimation incentive allocation counterfactual policy optimization

Download PDF

Related papers

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions 2020

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2020

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention 2020

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy 2020

Multi-Point Semantic Representation for Intent Classification 2020