Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

Ahmadreza Moradipari; Mohammad Pedramfar; Modjtaba Shokrian Zini; Vaneet Aggarwal

2023 NIPS NeurIPS 2023

Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

Abstract

In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We present a refined analysis of the information ratio, and show an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous reinforcement learning problem where $H$ is the episode length and $d_{l_1}$ is the Kolmogorov $l_1-$dimension of the space of environments. We then find concrete bounds of $d_{l_1}$ in a variety of settings, such as tabular, linear and finite mixtures, and discuss how our results improve the state-of-the-art.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ahmadreza Moradipari , Mohammad Pedramfar , Modjtaba Shokrian Zini , Vaneet Aggarwal

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Bayesian & Probabilistic > Bayesian Inference

Keywords

learning theory reinforcement learning thompson sampling bayesian regret bayesian regret bound information ratio kolmogorov dimension

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023