Exploring Safer Behaviors for Deep Reinforcement Learning

Enrico Marchesini; Davide Corsi; Alessandro Farinelli

2022 AAAI AAAI 2022

Exploring Safer Behaviors for Deep Reinforcement Learning

Abstract

Abstract We consider Reinforcement Learning (RL) problems where an agent attempts to maximize a reward signal while minimizing a cost function that models unsafe behaviors. Such formalization is addressed in the literature using constrained optimization on the cost, limiting the exploration and leading to a significant trade-off between cost and reward. In contrast, we propose a Safety-Oriented Search that complements Deep RL algorithms to bias the policy toward safety within an evolutionary cost optimization. We leverage evolutionary exploration benefits to design a novel concept of safe mutations that use visited unsafe states to explore safer actions. We further characterize the behaviors of the policies over desired specifics with a sample-based bound estimation, which makes prior verification analysis tractable in the training loop. Hence, driving the learning process towards safer regions of the policy space. Empirical evidence on the Safety Gym benchmark shows that we successfully avoid drawbacks on the return while improving the safety of the policy.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Reinforcement Learning

🧭 Keyword Pioneer — evolutionary exploration

🐣 Hot Topic Early Bird — safe reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Enrico Marchesini , Davide Corsi , Alessandro Farinelli

Topics

Artificial Intelligence > Core AI > AI Safety Reinforcement Learning > Methods > Deep RL Artificial Intelligence > Core AI > Robotics Deep Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Safety

Keywords

deep reinforcement learning policy optimization constrained optimization safe reinforcement learning cost function evolutionary algorithm evolutionary exploration safe mutation

Download PDF

Related papers

Dynamic Spatial Propagation Network for Depth Completion 2022

FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition 2022

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding 2022

AnchorFace: Boosting TAR@FAR for Practical Face Recognition 2022

Parallel and High-Fidelity Text-to-Lip Generation 2022