Tractable contextual bandits beyond realizability

Sanath Kumar Krishnamurthy; Vitor Hadad; Susan Athey

2021 AISTATS AISTATS 2021

Tractable contextual bandits beyond realizability

Abstract

Tractable contextual bandit algorithms often rely on the realizability assumption – i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Sanath Kumar Krishnamurthy , Vitor Hadad , Susan Athey

Topics

Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Multi-Agent Systems Artificial Intelligence > Learning Paradigms > Multi-Agent Systems

Keywords

regret analysis contextual bandit bias-variance trade-off misspecification error constrained regression

Download PDF

Related papers

Linear Regression Games: Convergence Guarantees to Approximate Out-of-Distribution Solutions 2021

Semi-Supervised Learning with Meta-Gradient 2021

Accelerating Metropolis-Hastings with Lightweight Inference Compilation 2021

When MAML Can Adapt Fast and How to Assist When It Cannot 2021

On the convergence of the Metropolis algorithm with fixed-order updates for multivariate binary probability distributions 2021