Delay and Cooperation in Nonstochastic Bandits

Nicol‘o Cesa-Bianchi; Claudio Gentile; Yishay Mansour; Alberto Minora

2016 COLT COLT 2016

Delay and Cooperation in Nonstochastic Bandits

Abstract

We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than d hops to arrive, where d is a delay parameter. We introduce Exp3-Coop, a cooperative version of the Exp3 algorithm and prove that with K actions and N agents the average per-agent regret after T rounds is at most of order \sqrt\left(d+1 + \fracKN\alpha_≤d\right)(T\ln K), where \alpha_≤d is the independence number of the d-th power of the communication graph G. We then show that for any connected graph, for d=\sqrtK the regret bound is K^1/4\sqrtT, strictly better than the minimax regret \sqrtKT for noncooperating agents. More informed choices of d lead to bounds which are arbitrarily close to the full information minimax regret \sqrtT\ln K when G is dense. When G has sparse components, we show that a variant of Exp3-Coop, allowing agents to choose their parameters according to their centrality in G, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with delay.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — communication network

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Security & Privacy

Authors

Nicol‘o Cesa-Bianchi , Claudio Gentile , Yishay Mansour , Alberto Minora

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

multi-agent learning regret bound communication network network centrality nonstochastic bandit

Download PDF

Related papers

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies 2016

Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture 2016

Open Problem: Kernel methods on manifolds and metric spaces. What is the probability of a positive definite geodesic exponential kernel? 2016

Learning and Testing Junta Distributions 2016

Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes 2016