2011
COLT
COLT 2011
A simple multi-armed bandit algorithm with optimal variation-bounded regret
Abstract
We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of $O(\sqrt{Q \log T})$, where $Q$ is the total quadratic variation of all the arms.
🚀
Conference Pioneer
— COLT 2011
🌉
Interdisciplinary Bridge
— Machine Learning and Mathematics & Optimization
📈
Trend Setter
— Optimization
🧭
Keyword Pioneer
— quadratic variation
🐣
Hot Topic Early Bird
— multi-armed bandit
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy