Piecewise-Stationary Bandits with Knapsacks

Xilin Zhang; Cheung Wang Chi

2024 NIPS NeurIPS 2024

Piecewise-Stationary Bandits with Knapsacks

Abstract

We study Bandits with Knapsacks (Bwk) in a piecewise-stationary environment. We propose a novel inventory reserving algorithm which draws new insights into the problem. Suppose parameters $\eta_{\min}, \eta_{\max} \in (0,1]$ respectively lower and upper bound the reward earned and the resources consumed in a time round. Our algorithm achieves a provably near-optimal competitive ratio of $O(\log(\eta_{\max}/\eta_{\min}))$, with a matching lower bound provided. Our performance guarantee is based on a dynamic benchmark, distinguishing our work from existing works on adversarial Bwk who compare with the static benchmark. Furthermore, different from existing non-stationary Bwk work, we do not require a bounded global variation.

🧭 Keyword Pioneer — dynamic benchmark

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

Authors

Xilin Zhang , Cheung Wang Chi

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Statistical Learning Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Learning Types > Optimization

Keywords

resource allocation multi-armed bandit regret bound online algorithm competitive ratio dynamic benchmark inventory management bandits with knapsack

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024