Safe Reinforcement Learning via Statistical Model Predictive Shielding

Osbert Bastani; Shuo Li

2021 RSS RSS 2021

Safe Reinforcement Learning via Statistical Model Predictive Shielding

Abstract

Reinforcement learning is a promising approach to solving hard robotics tasks. An important challenge is ensuring safety—e.g.; that a walking robot does not fall over or an autonomous car does not crash into an obstacle. We build on an approach that composes the learned policy with a backup policy—it uses the learned policy on the interior of the region where the backup policy is guaranteed to be safe; and switches to the backup policy on the boundary of this region. The key challenge is checking when the backup policy is guaranteed to be safe. Our algorithm; statistical model predictive shielding (SMPS); uses sampling-based verification and linear systems analysis to perform this check. We prove that SMPS ensures safety with high probability; and empirically evaluate its performance on several benchmarks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning and Robotics

🧭 Keyword Pioneer — statistical model predictive shielding

🐣 Hot Topic Early Bird — safe reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Osbert Bastani , Shuo Li

Topics

Artificial Intelligence > Core AI > AI Safety Reinforcement Learning > Applications > Robotics Robotics > Systems > Control Systems Machine Learning > Learning Types > Reinforcement Learning

Keywords

safe reinforcement learning statistical model predictive shielding backup policy sampling-based verification robotics safety statistical verification robot safety model predictive shielding

Download PDF

Related papers

Resolving Conflict in Decision-Making for Autonomous Driving 2021

Variational Inference MPC using Tsallis Divergence 2021

Jerk-limited Real-time Trajectory Generation with Arbitrary Target States 2021

Sampling-Based Motion Planning on Sequenced Manifolds 2021

Real-Time Multi-View 3D Human Pose Estimation using Semantic Feedback to Smart Edge Sensors 2021