2025 WACV WACV 2025

DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification

Abstract

Current multi-modal object re-identification approaches based on large-scale pre-trained backbones (i.e. ViT) have displayed remarkable progress and achieved excellent performance. However these methods usually adopt the standard full fine-tuning paradigm which requires the optimization of considerable backbone parameters causing extensive computational and storage requirements. In this work we propose an efficient prompt-tuning framework tailored for multi-modal object re-identification dubbed DMPT which freezes the main backbone and only optimizes several newly added decoupled modality-aware parameters. Specifically we explicitly decouple the visual prompts into modality-specific prompts which leverage prior modality knowledge from a powerful text encoder and modality-independent semantic prompts which extract semantic information from multi-modal inputs such as visible near-infrared and thermal-infrared. Built upon the extracted features we further design a Prompt Inverse Bind (PromptIBind) strategy that employs bind prompts as a medium to connect the semantic prompt tokens of different modalities and facilitates the exchange of complementary multi-modal information boosting final re-identification results. Experimental results on multiple common benchmarks demonstrate that our DMPT can achieve competitive results to existing state-of-the-art methods while requiring only 6.5% fine-tuning of the backbone parameters.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning
🧭 Keyword Pioneer — modality-aware learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio