Budget-Constrained Bandits over General Cost and Reward Distributions

Semih Cayci; Atilla Eryilmaz; R Srikant

2020 AISTATS AISTATS 2020

Budget-Constrained Bandits over General Cost and Reward Distributions

Abstract

We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order $(2+\gamma)$ for some $\gamma > 0$ exist for all cost-reward pairs, $O(\log B)$ regret is achievable for a budget $B>0$. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — budget-constrained optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Semih Cayci , Atilla Eryilmaz , R Srikant

Topics

Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

stochastic optimization multi-armed bandit regret bound budget-constrained optimization linear minimum mean-square error

Download PDF

Related papers

Stretching the Effectiveness of MLE from Accuracy to Bias for Pairwise Comparisons 2020

Fast and Accurate Ranking Regression 2020

Nonparametric Sequential Prediction While Deep Learning the Kernel 2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation 2020

Unconditional Coresets for Regularized Loss Minimization 2020