PEFT-BoA: Parameter-Efficient Fine-Tuning with Bag-of-Adapters for Multi-Modal Object Re-identification

Hongchao Li; Guangxing Liu; Xixi Wang; Baihe Liang; Yonglong Luo

2026 AAAI AAAI 2026

PEFT-BoA: Parameter-Efficient Fine-Tuning with Bag-of-Adapters for Multi-Modal Object Re-identification

Abstract

Abstract Multi-modal object Re-identification (ReID) aims to retrieve individuals by leveraging complementary information from different modalities. Recent CLIP-based approaches show promising results, but they usually employ prompt-based or hybrid prompt-adapter tuning and still face the problems of heterogeneous domain gap, fine-grained identity discrimination and noise instance interference. To address these problems, we introduce a novel Parameter-Efficient Fine-Tuning framework with Bag-of-Adapters (PEFT-BoA) based on the pre-trained CLIP's vision encoder for multi-modal object ReID. Specifically, we first propose a Domain-specific Patch Adapter (DPA) designed to bridge the visual feature gap between pre-trained and fine-tuned models at the local patch level. Meanwhile, we propose a Task-specific Class Adapter (TCA) enhance the fine-grained identity discrimination ability by optimizing global class token. Finally, we propose an Instance-specific Fusion Adapter (IFA) dynamically selects and combines only the most useful features across different modalities for each instance. Our PEFT-BoA achieves the better performance on multi-modal object re-identification benchmarks, while maintaining fewer trainable parameters (6.62M) and a higher training throughput (246.2fps).

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — pre-trained clip

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Hongchao Li , Guangxing Liu , Xixi Wang , Baihe Liang , Yonglong Luo

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Pretraining

Keywords

parameter-efficient fine-tuning multi-modal object re-identification pre-trained clip

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026