Handling Long and Richly Constrained Tasks through Constrained Hierarchical Reinforcement Learning

Yuxiao Lu; Arunesh Sinha; Pradeep Varakantham

2024 AAAI AAAI 2024

Handling Long and Richly Constrained Tasks through Constrained Hierarchical Reinforcement Learning

Abstract

Abstract Safety in goal directed Reinforcement Learning (RL) settings has typically been handled through constraints over trajectories and have demonstrated good performance in primarily short horizon tasks. In this paper, we are specifically interested in the problem of solving temporally extended decision making problems such as robots cleaning different areas in a house while avoiding slippery and unsafe areas (e.g., stairs) and retaining enough charge to move to a charging dock; in the presence of complex safety constraints. Our key contribution is a (safety) Constrained Search with Hierarchical Reinforcement Learning (CoSHRL) mechanism that combines an upper level constrained search agent (which computes a reward maximizing policy from a given start to a far away goal state while satisfying cost constraints) with a low-level goal conditioned RL agent (which estimates cost and reward values to move between nearby states). A major advantage of CoSHRL is that it can handle constraints on the cost value distribution (e.g., on Conditional Value at Risk, CVaR) and can adjust to flexible constraint thresholds without retraining. We perform extensive experiments with different types of safety constraints to demonstrate the utility of our approach over leading approaches in constrained and hierarchical RL.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning and Robotics

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuxiao Lu , Arunesh Sinha , Pradeep Varakantham

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics Artificial Intelligence > Core AI > Robotics Artificial Intelligence > Core AI > Reinforcement Learning Robotics > Applications > Robotics

Keywords

motion planning temporal difference learning policy optimization robot navigation constrained optimization hierarchical reinforcement learning constraint satisfaction safe reinforcement learning safety constraint robotic control conditional value at risk temporal extension

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024