Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses

Shiliang Zuo

2024 AISTATS AISTATS 2024

Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses

Abstract

I study adversarial attacks against stochastic bandit algorithms. At each round, the learner chooses an arm, and a stochastic reward is generated. The adversary strategically adds corruption to the reward, and the learner is only able to observe the corrupted reward at each round. Two sets of results are presented in this paper. The first set studies the optimal attack strategies for the adversary. The adversary has a target arm he wishes to promote, and his goal is to manipulate the learner into choosing this target arm $T - o(T)$ times. I design attack strategies against UCB and Thompson Sampling that only spends $\widehat{O}(\sqrt{\log T})$ cost. Matching lower bounds are presented, and the vulnerability of UCB, Thompson sampling and $\varepsilon$-greedy are exactly characterized. The second set studies how the learner can defend against the adversary. Inspired by literature on smoothed analysis and behavioral economics, I present two simple algorithms that achieve a competitive ratio arbitrarily close to 1.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — reward manipulation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Shiliang Zuo

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Learning Types > Adversarial Learning Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Learning Types > Multi-Armed Bandits Artificial Intelligence > Core AI > Game Theory

Keywords

adversarial robustness ucb algorithm thompson sampling adversarial attack multi-armed bandit upper confidence bound stochastic bandit smoothed analysis reward manipulation defense algorithm

Download PDF

Related papers

Causal Bandits with General Causal Models and Interventions 2024

Boundary-Aware Uncertainty for Feature Attribution Explainers 2024

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective 2024

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning 2024

Pure Exploration in Bandits with Linear Constraints 2024