Learning from Logged Implicit Exploration Data

Alex Strehl; John Langford; Lihong Li; Sham M. Kakade

2010 NIPS NeurIPS 2010

Learning from Logged Implicit Exploration Data

Abstract

We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in contextual bandit'' orpartially labeled'' settings where only the value of a chosen action is learned. The primary challenge in a variety of settings is that the exploration policy, in which ``offline'' data is logged, is not explicitly known. Prior solutions here require either control of the actions during the learning process, recorded random exploration, or actions chosen obliviously in a repeated manner. The techniques reported here lift these restrictions, allowing the learning of a policy for choosing actions given features from historical data where no randomization occurred or was logged. We empirically verify our solution on two reasonably sized sets of real-world data obtained from an Internet %online advertising company.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

📈 Trend Setter — Agent Systems

🧭 Keyword Pioneer — exploration data

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌱 Topic Pioneer — Offline RL

🐣 Hot Topic Early Bird — policy learning

Authors

Alex Strehl , John Langford , Lihong Li , Sham M. Kakade

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Learning Types > Semi-Supervised Learning Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Online Learning Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Learning Types > Offline RL Machine Learning > Learning Types > Exploration-Exploitation

Keywords

policy learning partially labeled data off-policy learning exploration data offline learning contextual bandit exploration datum

Download PDF

Related papers

Link Discovery using Graph Feature Tracking 2010

Trading off Mistakes and Don't-Know Predictions 2010

A Novel Kernel for Learning a Neuron Model from Spike Train Data 2010

Decomposing Isotonic Regression for Efficiently Solving Large Problems 2010

Learning Kernels with Radiuses of Minimum Enclosing Balls 2010