Contextual Combinatorial Cascading Bandits

Shuai Li; Baoxiang Wang; Shengyu Zhang; Wei Chen

2016 ICML ICML 2016

Contextual Combinatorial Cascading Bandits

Abstract

We propose the contextual combinatorial cascading bandits, a combinatorial online learning game, where at each time step a learning agent is given a set of contextual information, then selects a list of items, and observes stochastic outcomes of a prefix in the selected items by some stopping criterion. In online recommendation, the stopping criterion might be the first item a user selects; in network routing, the stopping criterion might be the first edge blocked in a path. We consider position discounts in the list order, so that the agent’s reward is discounted depending on the position where the stopping criterion is met. We design a UCB-type algorithm, C^3-UCB, for this problem, prove an n-step regret bound \tildeO(\sqrtn) in the general setting, and give finer analysis for two special cases. Our work generalizes existing studies in several directions, including contextual information, position discounts, and a more general cascading bandit model. Experiments on synthetic and real datasets demonstrate the advantage of involving contextual information and position discounts.

🧭 Keyword Pioneer — cascading bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

🐣 Hot Topic Early Bird — contextual bandit

Authors

Shuai Li , Baoxiang Wang , Shengyu Zhang , Wei Chen

Topics

Machine Learning > Learning Types > Active Learning Data Science & Analytics > Applications > Recommender Systems Machine Learning > Learning Types > Multi-Task Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

regret analysis online learning ucb algorithm contextual bandit combinatorial bandit cascading bandit position discount

Download PDF

Related papers

Associative Long Short-Term Memory 2016

Recycling Randomness with Structure for Sublinear time Kernel Expansions 2016

Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues 2016

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization 2016

Hawkes Processes with Stochastic Excitations 2016