Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

Qi Zhou; Yufei Kuang; Zherui Qiu; Houqiang Li; Jie Wang

2020 NIPS NeurIPS 2020

Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

Abstract

Many recent reinforcement learning (RL) methods learn stochastic policies with entropy regularization for exploration and robustness. However, in continuous action spaces, integrating entropy regularization with expressive policies is challenging and usually requires complex inference procedures. To tackle this problem, we propose a novel regularization method that is compatible with a broad range of expressive policy architectures. An appealing feature is that, the estimation of our regularization terms is simple and efficient even when the policy distributions are unknown. We show that our approach can effectively promote the exploration in continuous action spaces. Based on our regularization, we propose an off-policy actor-critic algorithm. Experiments demonstrate that the proposed algorithm outperforms state-of-the-art regularized RL methods in continuous control tasks.

🐣 Hot Topic Early Bird — entropy regularization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Qi Zhou , Yufei Kuang , Zherui Qiu , Houqiang Li , Jie Wang

Topics

Reinforcement Learning > Methods > Deep RL

Keywords

policy optimization continuous action space entropy regularization stochastic policy

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020