Algorithms for Infinitely Many-Armed Bandits

Yizao Wang; Jean-yves Audibert; Rémi Munos

2008 NIPS NeurIPS 2008

Algorithms for Infinitely Many-Armed Bandits

Abstract

We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matchs (up to logarithmic factors) the upper-bound in some cases.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

📈 Trend Setter — Online Algorithms

🧭 Keyword Pioneer — regret analysis

🐣 Hot Topic Early Bird — stochastic optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Yizao Wang , Jean-yves Audibert , Rémi Munos

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Stochastic Methods Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Learning Types > Exploration-Exploitation

Keywords

stochastic optimization regret analysis arm selection stochastic reward stochastic process multi-armed bandit upper confidence bound regret bound online algorithm

Download PDF

Related papers

On the Efficient Minimization of Classification Calibrated Surrogates 2008

Hebbian Learning of Bayes Optimal Decisions 2008

Biasing Approximate Dynamic Programming with a Lower Discount Factor 2008

Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation 2008

Domain Adaptation with Multiple Sources 2008