2019
UAI
UAI 2019
On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits
Abstract
We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically {\Omega}(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).
🚀
Conference Pioneer
— UAI 2019
🧭
Keyword Pioneer
— first-order bound
🐝
Cross-Pollinator
— Artificial Intelligence, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Reinforcement Learning
🌉
Interdisciplinary Bridge
— Machine Learning and Mathematics & Optimization