Safe Multi-agent Reinforcement Learning with Natural Language Constraints

Ziyan Wang; Meng Fang; Tristan Tomilin; Fei Fang; Yali Du

2026 AAAI AAAI 2026

Safe Multi-agent Reinforcement Learning with Natural Language Constraints

Abstract

Abstract Safe Multi-Agent Reinforcement Learning (MARL) typically relies on manually specified numeric cost functions to ensure that policy behaviours respect safety constraints. As systems scale and human-defined constraints become more diverse, context-dependent, and frequently updated, hand-crafting such cost functions becomes prohibitively complex, tedious, and error-prone. Natural language offers an intuitive and flexible alternative for defining constraints, enabling broader accessibility and easier adaptation to new scenarios and evolving rules. However, current MARL frameworks lack effective mechanisms to incorporate free-form textual constraints in a robust and principled way. To bridge this gap, we introduce Safe Multi-Agent Reinforcement Learning with natural Language constraints (SMALL), a framework that leverages fine-tuned language models to parse and encode textual constraints into semantically meaningful embeddings. These embeddings characterise prohibited states or behaviours and enable automatic prediction of constraint violations. We integrate the resulting learned costs directly into MARL training, allowing agents to optimise task performance while simultaneously minimising constraint violations, without requiring manually engineered numeric cost functions. To rigorously evaluate our method, we also propose the LaMaSafe benchmark---a set of diverse multi-agent tasks designed to assess the capability of MARL algorithms to understand and adhere to realistic, human-provided natural language constraints. Experimental results across LaMaSafe environments show that SMALL achieves comparable task performance to strong MARL baselines while significantly reducing constraint violations. While SMALL does not provide formal safety guarantees, it demonstrates that natural language can be used to shape multi-agent behaviour toward safer policies.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ziyan Wang , Meng Fang , Tristan Tomilin , Fei Fang , Yali Du

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Multi-Agent Systems

Keywords

multi-agent reinforcement learning cost function language model constraint violation natural language constraint

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026