Optimal Regret Bounds for Collaborative Learning in Bandits

Amitis Shidani; Sattar Vakili

2024 ALT ALT 2024

Optimal Regret Bounds for Collaborative Learning in Bandits

Abstract

We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Amitis Shidani , Sattar Vakili

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Artificial Intelligence > Learning Paradigms > Federated Learning Machine Learning > Optimization & Theory > Online Algorithms

Keywords

federated learning regret minimization collaborative learning multi-armed bandit regret bound

Download PDF

Related papers

The Impossibility of Parallelizing Boosting 2024

Online Recommendations for Agents with Discounted Adaptive Preferences 2024

RedEx: Beyond Fixed Representation Methods via Convex Optimization 2024

Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms 2024

A Polynomial Time, Pure Differentially Private Estimator for Binary Product Distributions 2024