Safe Reinforcement Learning with Linear Function Approximation

Sanae Amani; CHRISTOS THRAMPOULIDIS; Lin Yang

2021 ICML ICML 2021

Safe Reinforcement Learning with Linear Function Approximation

Abstract

Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to provide regret guarantees for settings where safety constraints need to be learned. In this paper, we address both problems by first modeling safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold. We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for episodic Markov decision processes (MDPs) with linear function approximation. We show that SLUCB-QVI and RSLUCB-QVI, while with \emph{no safety violation}, achieve a $\tilde{\mathcal{O}}\left(\kappa\sqrt{d^3H^3T}\right)$ regret, nearly matching that of state-of-the-art unsafe algorithms, where $H$ is the duration of each episode, $d$ is the dimension of the feature mapping, $\kappa$ is a constant characterizing the safety constraints, and $T$ is the total number of action plays. We further present numerical simulations that corroborate our theoretical findings.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Security & Privacy

🐣 Hot Topic Early Bird — safe reinforcement learning

Authors

Sanae Amani , CHRISTOS THRAMPOULIDIS , Lin Yang

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Safety

Keywords

episodic learning markov decision process linear function approximation safe reinforcement learning regret bound safety constraint constraint learning

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021