DeepInteraction: 3D Object Detection via Modality Interaction

Zeyu Yang; Jiaqi Chen; Zhenwei Miao; Wei Li; Xiatian Zhu; Li Zhang

2022 NIPS NeurIPS 2022

DeepInteraction: 3D Object Detection via Modality Interaction

Abstract

Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model performance. To address this limitation, in this work we introduce a novel modality interaction strategy where individual per-modality representations are learned and maintained throughout for enabling their unique characteristics to be exploited during object detection. To realize this proposed strategy, we design a DeepInteraction architecture characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments on the large-scale nuScenes dataset show that our proposed method surpasses all prior arts often by a large margin. Crucially, our method is ranked at the first position at the highly competitive nuScenes object detection leaderboard.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — modality interaction

🐣 Hot Topic Early Bird — multi-modal fusion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zeyu Yang , Jiaqi Chen , Zhenwei Miao , Wei Li , Xiatian Zhu , Li Zhang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Analysis > 3D Vision

Keywords

autonomous driving 3d object detection multi-modal fusion modality interaction per-modality representation

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022