Open Problem: Finite-Time Instance Dependent Optimality for Stochastic Online Learning with Feedback Graphs

Teodor Vanislavov Marinov; Mehryar Mohri; Julian Zimmert

2022 COLT COLT 2022

Open Problem: Finite-Time Instance Dependent Optimality for Stochastic Online Learning with Feedback Graphs

Abstract

Both asymptotic and non-asymptotic instance dependent regret bounds are known for the stochastic multi-armed bandit problem. Such regret bounds are known to be tight up to lower order terms in the setting of Gaussian rewards (Garivier et al., 2019). We revisit the related problem of stochastic online learning with feedback graphs, where asymptotically optimal instance dependent algorithms are known. Surprisingly, the notion of optimal finite-time regret is not a uniquely defined property in this context and in general, it is decoupled from the asymptotic rate. We pose two open problems. First we ask for a characterization of the finite time instance-dependent optimal regret. Next, we ask for a characterization of the set of graphs for which the finite time regret is bounded by the asymptotically optimal rate, for reasonable values of the time horizon.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — instance dependent regret

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Teodor Vanislavov Marinov , Mehryar Mohri , Julian Zimmert

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Stochastic Methods

Keywords

multi-armed bandit feedback graph stochastic online learning instance dependent regret

Download PDF

Related papers

Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares 2022

Analysis of Langevin Monte Carlo from Poincare to Log-Sobolev 2022

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance 2022

Tight query complexity bounds for learning graph partitions 2022

Pushing the Efficiency-Regret Pareto Frontier for Online Learning of Portfolios and Quantum States 2022