A Unified Diffusion-Based Framework for Multi-Agent Trajectory Prediction Integrating Structured Multi-Modal Representations

Chenxi Yang; Suyang Xi; Hong Ding; Yiqing Shen; Yunhao Liu

2026 WACV WACV 2026

A Unified Diffusion-Based Framework for Multi-Agent Trajectory Prediction Integrating Structured Multi-Modal Representations

Abstract

Autonomous multi-agent trajectory prediction in open-world scenarios presents persistent challenges, including high behavioral uncertainty, long-horizon dependencies, and the lack of structured guidance during generation. Existing generative approaches often compromise behavioral fidelity in favor of accuracy or diversity, resulting in predictions that are either unrealistic or difficult to control. We propose M^2Traj, a unified framework that couples a closed-loop conditional diffusion model with structured trajectory reasoning and behavior-driven constraints. M^2Traj features a history-guided encoder that captures long-range cross-agent dependencies and scene semantics, and a dynamic closed-loop rollout mechanism that refines predictions through goal-conditioned denoising with iterative feedback. To enable fine-grained control, we introduce a learnable behavior guidance module that softly enforces constraints on velocity, collision risk, comfort, and traffic rule adherence. By jointly modeling agent interactions, future constraints, and uncertainty within a structured generative process, M^2Traj delivers controllable and reliable predictions across diverse urban scenarios. Extensive experiments on three large-scale benchmarks--Waymo, HighD, and MoCAD--demonstrate that M^2Traj achieves balanced and reliable performance across standard accuracy, diversity, and behavior-sensitive metrics, highlighting its potential as a generalizable solution for controllable, structure-aware trajectory prediction in complex multi-agent environments.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — closed-loop conditional diffusion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio