CAPO: A Unified Policy Gradient Approach for Reward and Cost Optimization in Safe Reinforcement Learning (Student Abstract)

Xiaotao Liu; Prashant Mohit; Arvind Easwaran

2026 AAAI AAAI 2026

CAPO: A Unified Policy Gradient Approach for Reward and Cost Optimization in Safe Reinforcement Learning (Student Abstract)

Abstract

Abstract In safe reinforcement learning (SRL), there exists an inherent conflict between maximizing reward and minimizing cost. We propose a novel approach that effectively resolve the conflict between maximizing reward and minimizing cost in joint optimization.When the cost exceeds the threshold, we perform cost-reducing updates. Otherwise, we compute policy gradients that maximize expected rewards, while using second-order Taylor approximation to evaluate whether these reward-maximizing gradients would violate the cost constraint. If constraint violation is detected, we adjust the gradient direction to maintain safety compliance; otherwise, we execute standard reward-increasing policy updates. This approach helps ensure that reward-seeking updates do not inadvertently increase costs, thereby reducing the likelihood of constraint violations. Empirical tests show our framework successfully manages reward-cost trade-offs through reward augmentation and cost shaping, improving both performance and safety without switching optimization strategies. Results demonstrate that concurrent treatment of both objectives in one policy gradient update is viable for improving safe reinforcement learning methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiaotao Liu , Prashant Mohit , Arvind Easwaran

Topics

Artificial Intelligence > Core AI > AI Safety Reinforcement Learning > Methods > Deep RL

Keywords

policy gradient safe reinforcement learning cost constraint reward optimization

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026