Responsible Bandit Learning via Privacy-Protected Mean-Volatility Utility

Shanshan Zhao; Wenhai Cui; Bei Jiang; Linglong Kong; Xiaodong Yan

2024 AAAI AAAI 2024

Responsible Bandit Learning via Privacy-Protected Mean-Volatility Utility

Abstract

Abstract For ensuring the safety of users by protecting the privacy, the traditional privacy-preserving bandit algorithm aiming to maximize the mean reward has been widely studied in scenarios such as online ride-hailing, advertising recommendations, and personalized healthcare. However, classical bandit learning is irresponsible in such practical applications as they fail to account for risks in online decision-making and ignore external system information. This paper firstly proposes privacy protected mean-volatility utility as the objective of bandit learning and proves its responsibility, because it aims at achieving the maximum probability of utility by considering the risk. Theoretically, our proposed responsible bandit learning is expected to achieve the fastest convergence rate among current bandit algorithms and generates more statistical power than classical normality-based test. Finally, simulation studies provide supporting evidence for the theoretical results and demonstrate stronger performance when using stricter privacy budgets.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Security & Privacy

🧭 Keyword Pioneer — mean-volatility utility

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shanshan Zhao , Wenhai Cui , Bei Jiang , Linglong Kong , Xiaodong Yan

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Application Areas > Privacy Machine Learning > Application Areas > Risk Management Machine Learning > Learning Types > Reinforcement Learning Security & Privacy > Privacy Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Learning Types > Privacy

Keywords

risk management risk-aware learning utility optimization multi-armed bandit privacy budget privacy protection bandit learning online decision-making mean-volatility utility privacy-protected mean-volatility utility

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024