Multi-Constraint Deep Reinforcement Learning for Smooth Action Control

Guangyuan Zou; Ying He; F. Richard Yu; Longquan Chen; Weike Pan; Zhong Ming

2022 IJCAI IJCAI 2022

Multi-Constraint Deep Reinforcement Learning for Smooth Action Control

Abstract

Deep reinforcement learning (DRL) has been studied in a variety of challenging decision-making tasks, e.g., autonomous driving. \textcolor{black}{However, DRL typically suffers from the action shaking problem, which means that agents can select actions with big difference even though states only slightly differ.} One of the crucial reasons for this issue is the inappropriate design of the reward in DRL. In this paper, to address this issue, we propose a novel way to incorporate the smoothness of actions in the reward. Specifically, we introduce sub-rewards and add multiple constraints related to these sub-rewards. In addition, we propose a multi-constraint proximal policy optimization (MCPPO) method to solve the multi-constraint DRL problem. Extensive simulation results show that the proposed MCPPO method has better action smoothness compared with the traditional proportional-integral-differential (PID) and mainstream DRL algorithms. The video is available at https://youtu.be/F2jpaSm7YOg.

🧭 Keyword Pioneer — action smoothness

🐣 Hot Topic Early Bird — proximal policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Guangyuan Zou , Ying He , F. Richard Yu , Longquan Chen , Weike Pan , Zhong Ming

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics

Keywords

deep reinforcement learning autonomous driving proximal policy optimization action smoothness multi-constraint optimization

Download PDF

Related papers

Better Collective Decisions via Uncertainty Reduction 2022

Mixed Strategies for Security Games with General Defending Requirements 2022

Achieving Envy-Freeness with Limited Subsidies under Dichotomous Valuations 2022

Distortion in Voting with Top-t Preferences 2022

Let’s Agree to Agree: Targeting Consensus for Incomplete Preferences through Majority Dynamics 2022