The Uncertainty Bellman Equation and Exploration

Brendan O’Donoghue; Ian Osband; Rémi Munos; Vlad Mnih

2018 ICML ICML 2018

The Uncertainty Bellman Equation and Exploration

Abstract

We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar uncertainty Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the posterior distribution of the Q-values induced by any policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with complex generalization. Substituting our UBE-exploration strategy for $\epsilon$-greedy improves DQN performance on 51 out of 57 games in the Atari suite.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

📈 Trend Setter — AI Safety

🧭 Keyword Pioneer — deep q-network

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Security & Privacy

Authors

Brendan O’Donoghue , Ian Osband , Rémi Munos , Vlad Mnih

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Optimization & Theory > Bayesian Inference Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Uncertainty Quantification

Keywords

reinforcement learning posterior distribution exploration-exploitation tradeoff deep q-network variance bound uncertainty bellman equation count-based bonus

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018