Normal Bandits of Unknown Means and Variances

Wesley Cowan; Junya Honda; Michael N. Katehakis

2018 JMLR JMLR 2018

Normal Bandits of Unknown Means and Variances

Abstract

Consider the problem of sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $X^i_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $X^i_k$ denotes the outcome from population $i$ the $k^{th}$ time it is sampled. It is assumed that for each fixed $i$, $\{ X^i_k \}_{k \geq 1}$ is a sequence of i.i.d. normal random variables, with unknown mean $\mu_i$ and unknown variance $\sigma_i^2$. The objective is to have a policy $\pi$ for deciding from which of the $N$ populations to sample from at any time $t=1,2,\ldots$ so as to maximize the expected sum of outcomes of $n$ total samples or equivalently to minimize the regret due to lack on information of the parameters $\mu_i$ and $\sigma_i^2$. In this paper, we present a simple inflated sample mean (ISM) index policy that is asymptotically optimal in the sense of Theorem 4 below. This resolves a standing open problem from \cite{bkmab96}. Additionally, finite horizon regret bounds are given. [abs] [ pdf ][ bib ] © JMLR 2018. (edit, beta)

🧭 Keyword Pioneer — inflated sample mean

🐣 Hot Topic Early Bird — optimal policy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Wesley Cowan , Junya Honda , Michael N. Katehakis

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

optimal policy sequential sampling multi-armed bandit regret bound normal distribution inflated sample mean

Download PDF

Related papers

Simple Classification Using Binary Data 2018

ELFI: Engine for Likelihood-Free Inference 2018

Refining the Confidence Level for Optimistic Bandit Strategies 2018

An Efficient and Effective Generic Agglomerative Hierarchical Clustering Approach 2018

Convergence of Unregularized Online Learning Algorithms 2018