A simple multi-armed bandit algorithm with optimal variation-bounded regret

Elad Hazan; Satyen Kale

2011 COLT COLT 2011

A simple multi-armed bandit algorithm with optimal variation-bounded regret

Abstract

We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of $O(\sqrt{Q \log T})$, where $Q$ is the total quadratic variation of all the arms.

🚀 Conference Pioneer — COLT 2011

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

📈 Trend Setter — Optimization

🧭 Keyword Pioneer — quadratic variation

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Elad Hazan , Satyen Kale

Topics

Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Online Algorithms Mathematics & Optimization > Optimization > Optimization Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

online learning adversarial setting multi-armed bandit regret bound online algorithm linear-time algorithm quadratic variation variation-bounded regret

Download PDF

Related papers

Competitive Closeness Testing 2011

Bandits, Query Learning, and the Haystack Dimension 2011

Minimax Policies for Combinatorial Prediction Games 2011

Sample Complexity Bounds for Differentially Private Learning 2011

Multiclass Learnability and the ERM principle 2011