Sparse Stochastic Bandits

Joon Kwon; Vianney Perchet; Claire Vernade

2017 COLT COLT 2017

Sparse Stochastic Bandits

Abstract

In the classical multi-armed bandit problem, $d$ arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with $d$ (or with $\sqrt{d}$ in the minimax sense). We here consider the \emphsparse case of this classical problem in the sense that only a small number of arms, namely $s Cite this Paper BibTeX @InProceedings{pmlr-v65-kwon17a, title = {Sparse Stochastic Bandits}, author = {Kwon, Joon and Perchet, Vianney and Vernade, Claire}, booktitle = {Proceedings of the 2017 Conference on Learning Theory}, pages = {1269--1270}, year = {2017}, editor = {Kale, Satyen and Shamir, Ohad}, volume = {65}, series = {Proceedings of Machine Learning Research}, month = {07--10 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v65/kwon17a/kwon17a.pdf}, url = {https://proceedings.mlr.press/v65/kwon17a.html}, abstract = {In the classical multi-armed bandit problem, $d$ arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with $d$ (or with $\sqrt{d}$ in the minimax sense). We here consider the \emphsparse case of this classical problem in the sense that only a small number of arms, namely $s Copy to Clipboard Download Endnote %0 Conference Paper %T Sparse Stochastic Bandits %A Joon Kwon %A Vianney Perchet %A Claire Vernade %B Proceedings of the 2017 Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Satyen Kale %E Ohad Shamir %F pmlr-v65-kwon17a %I PMLR %P 1269--1270 %U https://proceedings.mlr.press/v65/kwon17a.html %V 65 %X In the classical multi-armed bandit problem, $d$ arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scal

🧭 Keyword Pioneer — sparse stochastic bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Joon Kwon , Vianney Perchet , Claire Vernade

Topics

Mathematics & Optimization > Optimization > Online Algorithms

Keywords

arm selection multi-armed bandit regret bound sparse stochastic bandit

Download PDF

Related papers

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization 2017

Open Problem: First-Order Regret Bounds for Contextual Bandits 2017

Open Problem: Meeting Times for Learning Random Automata 2017

Corralling a Band of Bandit Algorithms 2017

Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons 2017