Directional Optimism for Safe Linear Bandits

Spencer Hutchinson; Berkay Turan; Mahnoosh Alizadeh

2024 AISTATS AISTATS 2024

Directional Optimism for Safe Linear Bandits

Abstract

The safe linear bandit problem is a version of the classical stochastic linear bandit problem where the learner’s actions must satisfy an uncertain constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. By leveraging a novel approach that we call directional optimism, we find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Furthermore, we propose a novel algorithm for this setting that improves on existing algorithms in terms of empirical performance, while enjoying matching regret guarantees. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — safe linear bandit

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Security & Privacy

Authors

Spencer Hutchinson , Berkay Turan , Mahnoosh Alizadeh

Topics

Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods Artificial Intelligence > Core AI > Reasoning Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

regret minimization multi-armed bandit regret bound linear bandit stochastic bandit convex constraint safe linear bandit directional optimism safe exploration

Download PDF

Related papers

Causal Bandits with General Causal Models and Interventions 2024

Boundary-Aware Uncertainty for Feature Attribution Explainers 2024

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective 2024

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning 2024

Pure Exploration in Bandits with Linear Constraints 2024