Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Harm Van Seijen; Mehdi Fatemi; Arash Tavakoli

2019 NIPS NeurIPS 2019

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Abstract

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — logarithmic mapping

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Harm Van Seijen , Mehdi Fatemi , Arash Tavakoli

Topics

Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Deep Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

reinforcement learning policy optimization discount factor value function value estimation action gap logarithmic mapping

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019