Bandit Task Assignment with Unknown Processing Time

Shinji Ito; Daisuke Hatano; Hanna Sumita; Kei Takemura; Takuro Fukunaga; Naonori Kakimura; Ken-Ichi Kawarabayashi

2023 NIPS NeurIPS 2023

Bandit Task Assignment with Unknown Processing Time

Abstract

This study considers a novel problem setting, referred to as \textit{bandit task assignment}, that incorporates the processing time of each task in the bandit setting. In this problem setting, a player sequentially chooses a set of tasks to start so that the set of processing tasks satisfies a given combinatorial constraint. The reward and processing time for each task follow unknown distributions, values of which are revealed only after the task has been completed. The problem generalizes the stochastic combinatorial semi-bandit problem and the budget-constrained bandit problem. For this problem setting, we propose an algorithm based on upper confidence bounds~(UCB) combined with a phased-update approach. The proposed algorithm admits a gap-dependent regret upper bound of $O(MN(1/\Delta){\log T})$ and a gap-free regret upper bound of $\tilde{O}( \sqrt{MNT} )$, where $N$ is the number of the tasks, $M$ is the maximum number of tasks run at the same time, $T$ is the time horizon, and $\Delta$ is the gap between expected per-round rewards of the optimal and best suboptimal sets of tasks. These regret bounds nearly match lower bounds.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — bandit task assignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Shinji Ito , Daisuke Hatano , Hanna Sumita , Kei Takemura , Takuro Fukunaga , Naonori Kakimura , Ken-Ichi Kawarabayashi

Topics

Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

upper confidence bound regret bound combinatorial semi-bandit bandit task assignment processing time

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023