Improve Robustness of Reinforcement Learning against Observation Perturbations via l∞ Lipschitz Policy Networks

Buqing Nie; Jingtian Ji; Yangqing Fu; Yue Gao

2024 AAAI AAAI 2024

Improve Robustness of Reinforcement Learning against Observation Perturbations via l∞ Lipschitz Policy Networks

Abstract

Abstract Deep Reinforcement Learning (DRL) has achieved remarkable advances in sequential decision tasks. However, recent works have revealed that DRL agents are susceptible to slight perturbations in observations. This vulnerability raises concerns regarding the effectiveness and robustness of deploying such agents in real-world applications. In this work, we propose a novel robust reinforcement learning method called SortRL, which improves the robustness of DRL policies against observation perturbations from the perspective of the network architecture. We employ a novel architecture for the policy network that incorporates global $l_\infty$ Lipschitz continuity and provide a convenient method to enhance policy robustness based on the output margin. Besides, a training framework is designed for SortRL, which solves given tasks while maintaining robustness against $l_\infty$ bounded perturbations on the observations. Several experiments are conducted to evaluate the effectiveness of our method, including classic control tasks and video games. The results demonstrate that SortRL achieves state-of-the-art robustness performance against different perturbation strength.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — lipschitz policy network

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Buqing Nie , Jingtian Ji , Yangqing Fu , Yue Gao

Topics

Machine Learning > Learning Types > Adversarial Learning Machine Learning > Application Areas > Domain Generalization Reinforcement Learning > Methods > Deep RL Artificial Intelligence > Core AI > Robotics

Keywords

deep reinforcement learning reinforcement learning adversarial robustness network architecture lipschitz continuity policy network policy robustness lipschitz policy network observation perturbation

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024