Contextual Bandits with Linear Payoff Functions

Wei Chu; Lihong Li; Lev Reyzin; Robert Schapire

2011 AISTATS AISTATS 2011

Contextual Bandits with Linear Payoff Functions

Abstract

In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For $T$ rounds, $K$ actions, and d dimensional feature vectors, we prove an $O\left(\sqrt{Td\ln^3(KT\ln(T)/\delta)}\right)$ regret bound that holds with probability $1-\delta$ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of $\Omega(\sqrt{Td})$ for this setting, matching the upper bound up to logarithmic factors.

🧭 Keyword Pioneer — linear payoff function

🐣 Hot Topic Early Bird — upper confidence bound

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wei Chu , Lihong Li , Lev Reyzin , Robert Schapire

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Learning Types > Reinforcement Learning

Keywords

online learning upper confidence bound regret bound contextual bandit linear payoff function

Download PDF

Related papers

Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm 2011

Deep Learners Benefit More from Out-of-Distribution Examples 2011

Bagged Structure Learning of Bayesian Network 2011

Convergent Decomposition Solvers for Tree-reweighted Free Energies 2011

Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization 2011