Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

Swapna Buccapatnam; Fang Liu; Atilla Eryilmaz; Ness B. Shroff

2018 JMLR JMLR 2018

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

Abstract

We study the stochastic multi-armed bandit (MAB) problem in the presence of side-observations across actions that occur as a result of an underlying network structure. In our model, a bipartite graph captures the relationship between actions and a common set of unknowns such that choosing an action reveals observations for the unknowns that it is connected to. This models a common scenario in online social networks where users respond to their friends' activity, thus providing side information about each other's preferences. Our contributions are as follows: 1) We derive an asymptotic lower bound (with respect to time) as a function of the bi-partite network structure on the regret of any uniformly good policy that achieves the maximum long-term average reward. 2) We propose two policies - a randomized policy; and a policy based on the well- known upper confidence bound (UCB) policies - both of which explore each action at a rate that is a function of its network position. We show, under mild assumptions, that these policies achieve the asymptotic lower bound on the regret up to a multiplicative factor, independent of the network structure. Finally, we use numerical examples on a real-world social network and a routing example network to demonstrate the benefits obtained by our policies over other existing policies. [abs] [ pdf ][ bib ] © JMLR 2018. (edit, beta)

🐣 Hot Topic Early Bird — upper confidence bound

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Swapna Buccapatnam , Fang Liu , Atilla Eryilmaz , Ness B. Shroff

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Learning Types > Online Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

network structure multi-armed bandit upper confidence bound regret bound side observation

Download PDF

Related papers

Simple Classification Using Binary Data 2018

ELFI: Engine for Likelihood-Free Inference 2018

Refining the Confidence Level for Optimistic Bandit Strategies 2018

An Efficient and Effective Generic Agglomerative Hierarchical Clustering Approach 2018

Convergence of Unregularized Online Learning Algorithms 2018