Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit

Alexandra Carpentier; Rémi Munos

2012 AISTATS AISTATS 2012

Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit

Abstract

We consider a linear stochastic bandit problem where the dimension K of the unknown parameter heta is larger than the sampling budget n. Since usual linear bandit algorithms have a regret in O(K\sqrtn), it is in general impossible to obtain a sub-linear regret without further assumption. In this paper we make the assumption that heta is S-sparse, i.e. has at most S-non-zero components, and that the space of arms is the unit ball for the ||.||_2 norm. We combine ideas from Compressed Sensing and Bandit Theory to derive an algorithm with a regret bound in O(S\sqrtn). We detail an application to the problem of optimizing a function that depends on many variables but among which only a small number of them (initially unknown) are relevant.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — linear bandit

🐣 Hot Topic Early Bird — stochastic optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Security & Privacy

Authors

Alexandra Carpentier , Rémi Munos

Topics

Machine Learning > Learning Types > Active Learning Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits Mathematics & Optimization > Optimization > Sparse Optimization

Keywords

stochastic optimization sparse recovery compressed sensing sparse optimization regret bound online algorithm linear bandit stochastic bandit

Download PDF

Related papers

Minimax rates for homology inference 2012

Scalable Personalization of Long-Term Physiological Monitoring: Active Learning Methodologies for Epileptic Seizure Onset Detection 2012

Adaptive Metropolis with Online Relabeling 2012

Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing 2012

Bayesian regularization of non-homogeneous dynamic Bayesian networks by globally coupling interaction parameters 2012