Meta Learning in Bandits within shared affine Subspaces

Steven Bilaj; Sofien Dhouib; Setareh Maghsudi

2024 AISTATS AISTATS 2024

Meta Learning in Bandits within shared affine Subspaces

Abstract

We study the problem of meta-learning several contextual stochastic bandits tasks by leveraging their concentration around a low dimensional affine subspace, which we learn via online principal component analysis to reduce the expected regret over the encountered bandits. We propose and theoretically analyze two strategies that solve the problem: One based on the principle of optimism in the face of uncertainty and the other via Thompson sampling. Our framework is generic and includes previously proposed approaches as special cases. Besides, the empirical results show that our methods significantly reduce the regret on several bandit tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Speech & Audio

Authors

Steven Bilaj , Sofien Dhouib , Setareh Maghsudi

Topics

Artificial Intelligence > Learning Paradigms > Meta-Learning Machine Learning > Core Methods > Embedding Learning Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Learning Paradigms > Meta-Learning Machine Learning > Learning Types > Meta-Learning Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

meta learning principal component analysis regret minimization thompson sampling multi-armed bandit contextual bandit affine subspace

Download PDF

Related papers

Causal Bandits with General Causal Models and Interventions 2024

Boundary-Aware Uncertainty for Feature Attribution Explainers 2024

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective 2024

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning 2024

Pure Exploration in Bandits with Linear Constraints 2024