2007
NIPS
NeurIPS 2007
The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information
Abstract
We present Epoch-Greedy, an algorithm for multi-armed bandits with observable side information. Epoch-Greedy has the following properties: No knowledge of a time horizon $T$ is necessary. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. The regret scales as $O(T^{2/3} S^{1/3})$ or better (sometimes, much better). Here $S$ is the complexity term in a sample complexity bound for standard supervised learning.
🌉
Interdisciplinary Bridge
— Machine Learning and Mathematics & Optimization
📈
Trend Setter
— Online Algorithms
🧭
Keyword Pioneer
— epoch-greedy algorithm
🐣
Hot Topic Early Bird
— online learning
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Optimization & Theory > Learning Theory
Mathematics & Optimization > Optimization > Stochastic Methods
Mathematics & Optimization > Optimization > Online Algorithms
Machine Learning > Learning Types > Online Learning
Machine Learning > Optimization & Theory > Online Algorithms
Machine Learning > Learning Types > Multi-Armed Bandits