Cost-aware Cascading Bandits

Ruida Zhou; Chao Gan; Jing Yang; Cong Shen

2018 IJCAI IJCAI 2018

Cost-aware Cascading Bandits

Abstract

In this paper, we propose a cost-aware cascading bandits model, a new variant of multi-armed bandits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an {\it ordered} list of items and \congr{examines} them sequentially, until certain stopping condition is satisfied. Our objective is then to maximize the expected {\it net reward} in each step, i.e., the reward obtained in each step minus the total cost incurred in examining the items, by deciding the ordered list of items, as well as when to stop examination. We study both the offline and online settings, depending on whether the state and cost statistics of the items are known beforehand. For the offline setting, we show that the Unit Cost Ranking with Threshold 1 (UCR-T1) policy is optimal. For the online setting, we propose a Cost-aware Cascading Upper Confidence Bound (CC-UCB) algorithm, and show that the cumulative regret scales in $O(\log T)$. We also provide a lower bound for all $\alpha$-consistent policies, which scales in $\Omega(\log T)$ and matches our upper bound. The performance of the CC-UCB algorithm is evaluated with both synthetic and real-world data.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — cost-aware learning

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Ruida Zhou , Chao Gan , Jing Yang , Cong Shen

Topics

Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

online learning sequential decision bayesian optimization multi-armed bandit upper confidence bound regret bound cost-aware learning cascading bandit

Download PDF

Related papers

Semi-Supervised Multi-Modal Learning with Incomplete Modalities 2018

High-dimensional Similarity Learning via Dual-sparse Random Projection 2018

FISH-MML: Fisher-HSIC Multi-View Metric Learning 2018

Generative Warfare Nets: Ensemble via Adversaries and Collaborators 2018

Semi-Supervised Optimal Margin Distribution Machines 2018