From Bandits to Experts: A Tale of Domination and Independence

Noga Alon; Nicolò Cesa-bianchi; Claudio Gentile; Yishay Mansour

2013 NIPS NeurIPS 2013

From Bandits to Experts: A Tale of Domination and Independence

Abstract

We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir (2011). Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph. We also show that in the undirected case, the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — regret

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

Authors

Noga Alon , Nicolò Cesa-bianchi , Claudio Gentile , Yishay Mansour

Topics

Machine Learning > Optimization & Theory > Theory Mathematics & Optimization > Mathematics > Graph Theory Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Core Methods > Optimization

Keywords

regret minimization partial observability regret exp3 algorithm observability graph multi-armed bandit independence number domination number

Download PDF

Related papers

Latent Structured Active Learning 2013

On Flat versus Hierarchical Classification in Large-Scale Taxonomies 2013

Generalized Method-of-Moments for Rank Aggregation 2013

Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent 2013