Model-Value Inconsistency as a Signal for Epistemic Uncertainty

Angelos Filos; Eszter Vértes; Zita Marinho; Gregory Farquhar; Diana Borsa; Abram Friesen; Feryal Behbahani; Tom Schaul; Andre Barreto; Simon Osindero

2022 ICML ICML 2022

Model-Value Inconsistency as a Signal for Epistemic Uncertainty

Abstract

Using a model of the environment and a value function, an agent can construct many estimates of a state’s value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an implicit value ensemble (IVE). Consequently, the discrepancy between these estimates can be used as a proxy for the agent’s epistemic uncertainty; we term this signal model-value inconsistency or self-inconsistency for short. Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms. We provide empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful (i) as a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a learned model.

🧭 Keyword Pioneer — implicit ensemble

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Angelos Filos , Eszter Vértes , Zita Marinho , Gregory Farquhar , Diana Borsa , Abram Friesen , Feryal Behbahani , Tom Schaul , Andre Barreto , Simon Osindero

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Value Iteration

Keywords

epistemic uncertainty model-based reinforcement learning value estimation implicit ensemble

Download PDF

Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more 2022

Active fairness auditing 2022

Toward Compositional Generalization in Object-Oriented World Modeling 2022

Robustness Verification for Contrastive Learning 2022

Model-Value Inconsistency as a Signal for Epistemic Uncertainty

Abstract

Authors

Topics

Keywords

Related papers