Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

Motoki Omura; Takayuki Osa; Yusuke Mukuta; Tatsuya Harada

2024 AAAI AAAI 2024

Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

Abstract

Abstract In deep reinforcement learning, estimating the value function to evaluate the quality of states and actions is essential. The value function is often trained using the least squares method, which implicitly assumes a Gaussian error distribution. However, a recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator, and violates the implicit assumption of normal error distribution in the least squares method. To address this, we proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution. We evaluated the proposed method on continuous control benchmark tasks in MuJoCo. It improved the sample efficiency of a state-of-the-art reinforcement learning method by reducing the skewness of the error distribution.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Motoki Omura , Takayuki Osa , Yusuke Mukuta , Tatsuya Harada

Topics

Machine Learning > Optimization & Theory > Loss Functions Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Reinforcement Learning > Methods > Value Iteration

Keywords

deep reinforcement learning reinforcement learning sample efficiency value function bellman error error distribution

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024