2008
NIPS
NeurIPS 2008
Algorithms for Infinitely Many-Armed Bandits
Abstract
We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matchs (up to logarithmic factors) the upper-bound in some cases.
🌉
Interdisciplinary Bridge
— Machine Learning and Mathematics & Optimization
📈
Trend Setter
— Online Algorithms
🧭
Keyword Pioneer
— regret analysis
🐣
Hot Topic Early Bird
— stochastic optimization
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy
Authors
Topics
Machine Learning > Optimization & Theory > Learning Theory
Mathematics & Optimization > Optimization > Stochastic Methods
Mathematics & Optimization > Optimization > Online Algorithms
Machine Learning > Learning Types > Online Learning
Machine Learning > Optimization & Theory > Online Algorithms
Machine Learning > Optimization & Theory > Stochastic Methods
Machine Learning > Learning Types > Multi-Armed Bandits
Machine Learning > Learning Types > Exploration-Exploitation