Multi-armed Bandit Problems with History

Pannagadatta Shivaswamy; Thorsten Joachims

2012 AISTATS AISTATS 2012

Multi-armed Bandit Problems with History

Abstract

In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts. We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data can reduce regret from logarithmic to constant. The effectiveness of the proposed algorithms are demonstrated on a large-scale malicious URL detection problem.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — historical datum

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Pannagadatta Shivaswamy , Thorsten Joachims

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

online learning multi-armed bandit regret bound historical datum url detection

Download PDF

Related papers

Minimax rates for homology inference 2012

Scalable Personalization of Long-Term Physiological Monitoring: Active Learning Methodologies for Epileptic Seizure Onset Detection 2012

Adaptive Metropolis with Online Relabeling 2012

Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing 2012

Bayesian regularization of non-homogeneous dynamic Bayesian networks by globally coupling interaction parameters 2012