Resourceful Contextual Bandits

Ashwinkumar Badanidiyuru; John Langford; Aleksandrs Slivkins

2014 COLT COLT 2014

Resourceful Contextual Bandits

Abstract

We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and bandits with resource constraints (bandits with knapsacks, Badanidiyuru et al. (2013a)), and prove a regret guarantee with near-optimal statistical properties.

📈 Trend Setter — Risk Management

🧭 Keyword Pioneer — knapsack problem

🐣 Hot Topic Early Bird — contextual bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ashwinkumar Badanidiyuru , John Langford , Aleksandrs Slivkins

Topics

Machine Learning > Application Areas > Risk Management Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

online learning regret bound contextual bandit knapsack problem resource constraint

Download PDF

Related papers

Open Problem: Shifting Experts on Easy Data 2014

Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms 2014

Sample Complexity Bounds on Differentially Private Learning via Communication Complexity 2014

Principal Component Analysis and Higher Correlations for Distributed Data 2014

Compressed Counting Meets Compressed Sensing 2014