Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Xianpeng Liu; Ce Zheng; Ming Qian; Nan Xue; Chen Chen; Zhebin Zhang; Chen Li; Tianfu Wu

2024 CVPR CVPR 2024

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Abstract

We present Multi-View Attentive Contextualization (MvACon) a simple yet effective method for improving 2D-to-3D feature lifting in query-based multi-view 3D (MV3D) object detection. Despite remarkable progress witnessed in the field of query-based MV3D object detection prior art often suffers from either the lack of exploiting high-resolution 2D features in dense attention-based lifting due to high computational costs or from insufficiently dense grounding of 3D queries to multi-scale 2D features in sparse attention-based lifting. Our proposed MvACon hits the two birds with one stone using a representationally dense yet computationally sparse attentive feature contextualization scheme that is agnostic to specific 2D-to-3D feature lifting approaches. In experiments the proposed MvACon is thoroughly tested on the nuScenes benchmark using both the BEVFormer and its recent 3D deformable attention (DFA3D) variant as well as the PETR showing consistent detection performance improvement especially in enhancing performance in location orientation and velocity prediction. It is also tested on the Waymo-mini benchmark using BEVFormer with similar improvement. We qualitatively and quantitatively show that global cluster-based contexts effectively encode dense scene-level contexts for MV3D object detection. The promising results of our proposed MvACon reinforces the adage in computer vision "(contextualized) feature matters".

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — feature contextualization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xianpeng Liu , Ce Zheng , Ming Qian , Nan Xue , Chen Chen , Zhebin Zhang , Chen Li , Tianfu Wu

Topics

Artificial Intelligence > Core AI > Autonomous Vehicles Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Object Detection Computer Vision > Domain-Specific > Autonomous Driving

Keywords

attention mechanism autonomous driving multi-modal learning query-based detection multi-view 3d object detection feature contextualization

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024