Information Complexity in Bandit Subset Selection

Emilie Kaufmann; Shivaram Kalyanakrishnan

2013 COLT COLT 2013

Information Complexity in Bandit Subset Selection

Abstract

We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the Chernoff information between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between the “racing” and “smart sampling” strategies for pure-exploration problems, finding strong evidence in favor of the latter.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — pure exploration

🐝 Cross-Pollinator — Artificial Intelligence, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy

📈 Trend Setter — Evaluation

🐣 Hot Topic Early Bird — pac learning

Authors

Emilie Kaufmann , Shivaram Kalyanakrishnan

Topics

Mathematics & Optimization > Mathematics > Information Theory Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Information Theory Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Learning Types > Evaluation Machine Learning > Learning Types > Exploration

Keywords

pac learning arm selection multi-armed bandit stochastic bandit pure exploration chernoff information

Download PDF

Related papers

A Tensor Spectral Approach to Learning Mixed Membership Community Models 2013

Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem 2013

Boosting with the Logistic Loss is Consistent 2013

Online Learning with Predictable Sequences 2013

Recovering the Optimal Solution by Dual Random Projection 2013