2008 NIPS NeurIPS 2008

Algorithms for Infinitely Many-Armed Bandits

Abstract

We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matchs (up to logarithmic factors) the upper-bound in some cases.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization
📈 Trend Setter — Online Algorithms
🧭 Keyword Pioneer — regret analysis
🐣 Hot Topic Early Bird — stochastic optimization
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy