2019 UAI UAI 2019

Problem-dependent Regret Bounds for Online Learning with Feedback Graphs

Abstract

This paper addresses the stochastic multi-armed bandit problem with an undirected feedback graph. We devise a UCB-based algorithm, UCB-NE, to provide a problem-dependent regret bound that depends on a clique covering. Our algorithm obtains regret which provably scales linearly with the clique covering number. Additionally, we provide problem-dependent regret bounds for a Thompson Sampling-based algorithm, TS-N, where again the bounds are linear in the clique covering number. Finally, we present experimental results to see how UCB-NE, TS-N, and a few related algorithms perform practically.

🚀 Conference Pioneer — UAI 2019
🐣 Hot Topic Early Bird — multi-armed bandit
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy
🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization