Pure-Past Action Masking

Giovanni Varricchione; Natasha Alechina; Mehdi Dastani; Giuseppe De Giacomo; Brian Logan; Giuseppe Perelli

2024 AAAI AAAI 2024

Pure-Past Action Masking

Abstract

Abstract We present Pure-Past Action Masking (PPAM), a lightweight approach to action masking for safe reinforcement learning. In PPAM, actions are disallowed (“masked”) according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL). PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the (possibly hidden) MDP. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separation of concerns between the safety constraints and reward specifications of the (learning) agent. We prove formally that an agent trained with PPAM can learn any optimal policy that satisfies the safety constraints, and that they are as expressive as shields, another approach to enforce non-Markovian constraints in RL. Finally, we provide empirical results showing how PPAM can guarantee constraint satisfaction in practice.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Knowledge & Reasoning and Reinforcement Learning

🧭 Keyword Pioneer — pure-past linear temporal logic

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Giovanni Varricchione , Natasha Alechina , Mehdi Dastani , Giuseppe De Giacomo , Brian Logan , Giuseppe Perelli

Topics

Artificial Intelligence > Core AI > AI Safety Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Game AI Knowledge & Reasoning > Reasoning > Formal Methods Artificial Intelligence > Core AI > Game Theory Deep Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reinforcement Learning Artificial Intelligence > Core AI > Safety

Keywords

formal methods constraint satisfaction safe reinforcement learning temporal logic action masking pure-past linear temporal logic

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024