Epistemic Bellman Operators

Pascal R. van der Vaart; Matthijs T. J. Spaan; Neil Yorke-Smith

2025 AAAI AAAI 2025

Epistemic Bellman Operators

Abstract

Abstract Uncertainty quantification remains a difficult challenge in reinforcement learning. Several algorithms exist that successfully quantify uncertainty in a practical setting. However it is unclear whether these algorithms are theoretically sound and can be expected to converge. Furthermore, they seem to treat the uncertainty in the target parameters in different ways. In this work, we unify several practical algorithms into one theoretical framework by defining a new Bellman operator on distributions, and show that this Bellman operator is a contraction. We highlight use cases of our framework by analyzing an existing Bayesian Q-learning algorithm, and also introduce a novel uncertainty-aware variant of PPO that adaptively sets its clipping hyperparameter.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — bayesian q-learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Pascal R. van der Vaart , Matthijs T. J. Spaan , Neil Yorke-Smith

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Reinforcement Learning > Methods > Deep RL Artificial Intelligence > Bayesian & Probabilistic > Bayesian Inference Machine Learning > Learning Types > Uncertainty Quantification Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

reinforcement learning theory uncertainty quantification epistemic uncertainty distributional reinforcement learning bellman operator contraction mapping bayesian q-learning

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025