SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

Jiahao Wang; Zhongwei Jiang; Wenchao Sun; Jiaru Zhong; Haibao Yu; Yuner Zhang; Chenyang Lu; Chuang Zhang; Lei He; Shaobing Xu; Jianqiang Wang

2026 AAAI AAAI 2026

SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

Abstract

Abstract Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints. While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability. In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework features a trio of innovations: a kinematic grounded instance query that uses an explicit state vector with 3D geometry and velocity for precise spatio-temporal alignment; a coarse-to-fine aggregation module that effectively integrates information from both matched and unmatched instances; and a cooperative instance denoising task that provides stable, abundant supervision to accelerate and stabilize training. Experiments on V2X-Seq and Griffin datasets show SparseCoop achieves state-of-the-art performance. Notably, it delivers this performance with superior computational efficiency and a highly competitive transmission cost, while showing remarkable robustness to real-world challenges like communication latency.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Jiahao Wang , Zhongwei Jiang , Wenchao Sun , Jiaru Zhong , Haibao Yu , Yuner Zhang , Chenyang Lu , Chuang Zhang , Lei He , Shaobing Xu , Jianqiang Wang

Topics

Computer Vision > Analysis > Object Tracking Computer Vision > Domain-Specific > Autonomous Driving

Keywords

autonomous driving bird's eye view 3d object detection cooperative perception multi-agent tracking

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026