Thresholded linear bandits

Nishant A. Mehta; Junpei Komiyama; Vamsi K. Potluru; Andrea Nguyen; Mica Grant-Hagen

2023 AISTATS AISTATS 2023

Thresholded linear bandits

Abstract

We introduce the thresholded linear bandit problem, a novel sequential decision making problem at the interface of structured stochastic multi-armed bandits and learning halfspaces. The set of arms is $[0, 1]^d$, the expected Bernoulli reward is piecewise constant with a jump at a separating hyperplane, and each arm is associated with a cost that is a positive linear combination of the arm’s components. This problem is motivated by several practical applications. For instance, imagine tuning the continuous features of an offer to a consumer; higher values incur higher cost to the vendor but result in a more attractive offer. At some threshold, the offer is attractive enough for a random consumer to accept at the higher probability level. For the one-dimensional case, we present Leftist, which enjoys $\log^2 T$ problem-dependent regret in favorable cases and has $\log(T) \sqrt{T}$ worst-case regret; we also give a lower bound suggesting this is unimprovable. We then present MD-Leftist, our extension of Leftist to the multi-dimensional case, which obtains similar regret bounds but with $d^{2.5} \log d$ and $d^{1.5} \log d$ dependence on dimension for the two types of bounds respectively. Finally, we experimentally evaluate Leftist.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — piecewise constant reward

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Nishant A. Mehta , Junpei Komiyama , Vamsi K. Potluru , Andrea Nguyen , Mica Grant-Hagen

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Statistical Learning

Keywords

sequential decision making multi-armed bandit regret bound linear bandit piecewise constant reward learning halfspace

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023