DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification

Minghui Lin; Shu Wang; Xiang Wang; Jianhua Tang; Longbin Fu; Zhengrong Zuo; Nong Sang

2025 WACV WACV 2025

DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification

Abstract

Current multi-modal object re-identification approaches based on large-scale pre-trained backbones (i.e. ViT) have displayed remarkable progress and achieved excellent performance. However these methods usually adopt the standard full fine-tuning paradigm which requires the optimization of considerable backbone parameters causing extensive computational and storage requirements. In this work we propose an efficient prompt-tuning framework tailored for multi-modal object re-identification dubbed DMPT which freezes the main backbone and only optimizes several newly added decoupled modality-aware parameters. Specifically we explicitly decouple the visual prompts into modality-specific prompts which leverage prior modality knowledge from a powerful text encoder and modality-independent semantic prompts which extract semantic information from multi-modal inputs such as visible near-infrared and thermal-infrared. Built upon the extracted features we further design a Prompt Inverse Bind (PromptIBind) strategy that employs bind prompts as a medium to connect the semantic prompt tokens of different modalities and facilitates the exchange of complementary multi-modal information boosting final re-identification results. Experimental results on multiple common benchmarks demonstrate that our DMPT can achieve competitive results to existing state-of-the-art methods while requiring only 6.5% fine-tuning of the backbone parameters.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — modality-aware learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Minghui Lin , Shu Wang , Xiang Wang , Jianhua Tang , Longbin Fu , Zhengrong Zuo , Nong Sang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Optimization & Theory > Neural Network Optimization Computer Vision > Analysis > Object Detection Computer Vision > Analysis > Person Re-Identification Machine Learning > Learning Types > Multi-Modal Learning Deep Learning > Techniques > Transfer Learning

Keywords

vision transformer multimodal learning multi-modal learning parameter efficient fine-tuning modality fusion prompt tuning object re-identification modality-aware learning

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025