Open Problem: First-Order Regret Bounds for Contextual Bandits

Alekh Agarwal; Akshay Krishnamurthy; John Langford; Haipeng Luo; Schapire Robert E.

2017 COLT COLT 2017

Open Problem: First-Order Regret Bounds for Contextual Bandits

Abstract

We describe two open problems related to first order regret bounds for contextual bandits. The first asks for an algorithm with a regret bound of $\tilde{\mathcal{O}}(\sqrt{L_⋆}K \ln N)$ where there are $K$ actions, $N$ policies, and $L_⋆$ is the cumulative loss of the best policy. The second asks for an optimization-oracle-efficient algorithm with regret $\tilde{\mathcal{O}}(L_⋆^{2/3}poly(K, \ln(N/δ)))$. We describe some positive results, such as an inefficient algorithm for the second problem, and some partial negative results.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — first-order regret

🐣 Hot Topic Early Bird — contextual bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Alekh Agarwal , Akshay Krishnamurthy , John Langford , Haipeng Luo , Schapire Robert E.

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Theory Mathematics & Optimization > Optimization > Online Algorithms

Keywords

learning theory regret bound contextual bandit first-order regret optimization oracle

Download PDF

Related papers

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization 2017

Open Problem: Meeting Times for Learning Random Automata 2017

Corralling a Band of Bandit Algorithms 2017

Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons 2017

Testing Bayesian Networks 2017