An Analysis of Categorical Distributional Reinforcement Learning

Mark Rowland; Marc Bellemare; Will Dabney; Rémi Munos; Yee Whye Teh

2018 AISTATS AISTATS 2018

An Analysis of Categorical Distributional Reinforcement Learning

Abstract

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cramer distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — cramer distance

🐣 Hot Topic Early Bird — convergence guarantee

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Mark Rowland , Marc Bellemare , Will Dabney , Rémi Munos , Yee Whye Teh

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning

Keywords

distributional reinforcement learning bellman operator convergence guarantee cramer distance value-based reinforcement learning categorical distribution c51 algorithm

Download PDF

Related papers

The Geometry of Random Features 2018

A Fast Algorithm for Separated Sparsity via Perturbed Lagrangians 2018

Regional Multi-Armed Bandits 2018

Group Invariance Principles for Causal Generative Models 2018

Stochastic Three-Composite Convex Minimization with a Linear Operator 2018