X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

Marvin Klingner; Shubhankar Borse; Varun Ravi Kumar; Behnaz Rezaei; Venkatraman Narayanan; Senthil Yogamani; Fatih Porikli

2023 CVPR CVPR 2023

X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

Abstract

Recent advances in 3D object detection (3DOD) have obtained remarkably strong results for LiDAR-based models. In contrast, surround-view 3DOD models based on multiple camera images underperform due to the necessary view transformation of features from perspective view (PV) to a 3D world representation which is ambiguous due to missing depth information. This paper introduces X3KD, a comprehensive knowledge distillation framework across different modalities, tasks, and stages for multi-camera 3DOD. Specifically, we propose cross-task distillation from an instance segmentation teacher (X-IS) in the PV feature extraction stage providing supervision without ambiguous error backpropagation through the view transformation. After the transformation, we apply cross-modal feature distillation (X-FD) and adversarial training (X-AT) to improve the 3D world representation of multi-camera features through the information contained in a LiDAR-based 3DOD teacher. Finally, we also employ this teacher for cross-modal output distillation (X-OD), providing dense supervision at the prediction stage. We perform extensive ablations of knowledge distillation at different stages of multi-camera 3DOD. Our final X3KD model outperforms previous state-of-the-art approaches on the nuScenes and Waymo datasets and generalizes to RADAR-based 3DOD. Qualitative results video at https://youtu.be/1do9DPFmr38.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — multi-camera 3d object detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Marvin Klingner , Shubhankar Borse , Varun Ravi Kumar , Behnaz Rezaei , Venkatraman Narayanan , Senthil Yogamani , Fatih Porikli

Topics

Machine Learning > Application Areas > Knowledge Distillation Computer Vision > Analysis > Object Detection Computer Vision > Domain-Specific > Autonomous Driving

Keywords

knowledge distillation feature distillation cross-modal distillation view transformation multi-camera 3d object detection

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023