Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

Nino Vieillard; Tadashi Kozuno; Bruno Scherrer; Olivier Pietquin; Rémi Munos; Matthieu Geist

2020 NIPS NeurIPS 2020

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

Abstract

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance. Yet, only little is understood theoretically about why KL regularization helps, so far. We study KL regularization within an approximate value iteration scheme and show that it implicitly averages q-values. Leveraging this insight, we provide a very strong performance bound, the very first to combine two desirable aspects: a linear dependency to the horizon (instead of quadratic) and an error propagation term involving an averaging effect of the estimation errors (instead of an accumulation effect). We also study the more general case of an additional entropy regularizer. The resulting abstract scheme encompasses many existing RL algorithms. Some of our assumptions do not hold with neural networks, so we complement this theoretical analysis with an extensive empirical study.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nino Vieillard , Tadashi Kozuno , Bruno Scherrer , Olivier Pietquin , Rémi Munos , Matthieu Geist

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL

Keywords

reinforcement learning value iteration kl regularization

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020