2024 CVPR CVPR 2024

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

Abstract

Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers which was inefficient and lacked generalized representation due to the scarcity of multimodal data. Therefore recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data. However the modality gap limits pre-trained knowledge recall and the dominance of the RGB modality persists preventing the full utilization of information from other modalities. To address these issues we propose a novel symmetric multimodal tracking framework called SDSTrack. We introduce lightweight adaptation for efficient fine-tuning which directly transfers the feature extraction ability from RGB to other domains with a small number of trainable parameters and integrates multimodal features in a balanced symmetric manner. Furthermore we design a complementary masked patch distillation strategy to enhance the robustness of trackers in complex environments such as extreme weather poor imaging and sensor failure. Extensive experiments demonstrate that SDSTrack outperforms state-of-the-art methods in various multimodal tracking scenarios including RGB+Depth RGB+Thermal and RGB+Event tracking and exhibits impressive results in extreme conditions. Our source code is available at : https://github.com/hoqolo/SDSTrack.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning
🧭 Keyword Pioneer — multimodal tracking
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio