Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits

Taira Tsuchiya; Shinji Ito; Junya Honda

2023 AISTATS AISTATS 2023

Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits

Abstract

We consider the combinatorial semi-bandit problem and present a new algorithm with a best-of-both-worlds regret guarantee; the regrets are bounded near-optimally in the stochastic and adversarial regimes. In the stochastic regime, we prove a variance-dependent regret bound depending on the tight suboptimality gap introduced by Kveton et al. (2015) with a good leading constant. In the adversarial regime, we show that the same algorithm simultaneously obtains various data-dependent regret bounds. Our algorithm is based on the follow-the-regularized-leader framework with a refined regularizer and adaptive learning rate. Finally, we numerically test the proposed algorithm and confirm its superior or competitive performance over existing algorithms, including Thompson sampling under most settings.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — stochastic regime

🐣 Hot Topic Early Bird — combinatorial optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Taira Tsuchiya , Shinji Ito , Junya Honda

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Combinatorial Optimization Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

combinatorial optimization online learning thompson sampling multi-armed bandit regret bound combinatorial semi-bandit stochastic regime

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023