Problem-dependent Regret for Lexicographic Multi-Armed Bandits with Adversarial Corruptions

Bo Xue; Xi Lin; Yuanyu Wan; Qingfu Zhang

2025 IJCAI IJCAI 2025

Problem-dependent Regret for Lexicographic Multi-Armed Bandits with Adversarial Corruptions

Abstract

This paper studies lexicographic multi-armed bandits (MAB), where after selecting an arm, the agent observes a reward vector including multiple objectives, each with a different level of importance. Although previous literature has proposed the algorithm for lexicographic MAB, their algorithm suffers from several limitations: (1) it exhibits poor adversarial robustness due to its reliance on stochastic rewards, (2) its regret bound is suboptimal compared to single-objective counterparts, and (3) the regret bound does not adapt to specific problem instances. To address these limitations, we study lexicographic MAB with adversarial corruptions, where an adversary might corrupt the stochastic rewards with a corruption budget of C. First, when the value of C is known, we propose an algorithm achieving a problem-dependent regret bound of O(∑(log T / Δⁱ(a) + C)) for the i-th objective (i ∈ [M]), where Δⁱ(a) is the reward gap for arm a on the i-th objective, and M is the number of objectives. In the purely stochastic setting (C=0), this regret bound approaches optimality. Second, we introduce another algorithm that does not require value of C but incurs a less favorable regret bound of O(∑(γ_T / Δⁱ(a) + γ_T)) for the i-th objective, where γ_T = O((log T)² + KC(log T)²). Finally, we conduct experiments on both synthetic and real-world datasets to verify the effectiveness of our algorithms.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Bo Xue , Xi Lin , Yuanyu Wan , Qingfu Zhang

Topics

Machine Learning > Learning Types > Adversarial Learning Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

online learning multi-armed bandit regret bound adversarial corruption lexicographic optimization problem-dependent regret

Download PDF

Related papers

Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain 2025

Responsibility Anticipation and Attribution in LTLf 2025

Argument-based Multi-Issue Negotiation 2025

Online Resource Sharing: Better Robust Guarantees via Randomized Strategies 2025

Equitable Mechanism Design for Facility Location 2025