Bandit Social Learning under Myopic Behavior

Kiarash Banihashem; MohammadTaghi Hajiaghayi; Suho Shin; Aleksandrs Slivkins

2023 NIPS NeurIPS 2023

Bandit Social Learning under Myopic Behavior

Abstract

We study social learning dynamics motivated by reviews on online platforms. Theagents collectively follow a simple multi-armed bandit protocol, but each agentacts myopically, without regards to exploration. We allow a wide range of myopicbehaviors that are consistent with (parameterized) confidence intervals for the arms’expected rewards. We derive stark exploration failures for any such behavior, andprovide matching positive results. As a special case, we obtain the first generalresults on failure of the greedy algorithm in bandits, thus providing a theoreticalfoundation for why bandit algorithms should explore.

🌉 Interdisciplinary Bridge — Mathematics & Optimization and Reinforcement Learning

🧭 Keyword Pioneer — exploration failure

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kiarash Banihashem , MohammadTaghi Hajiaghayi , Suho Shin , Aleksandrs Slivkins

Topics

Reinforcement Learning > Methods > Multi-Agent Systems Mathematics & Optimization > Optimization > Online Algorithms

Keywords

multi-armed bandit greedy algorithm social learning exploration failure myopic behavior

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023