Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

Weizhen He; Yiheng Deng; SHIXIANG TANG; Qihao Chen; Qingsong Xie; Yizhou Wang; LEI BAI; Feng Zhu; Rui Zhao; Wanli Ouyang; Donglian Qi; Yunfeng Yan

2024 CVPR CVPR 2024

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

Abstract

Human intelligence can retrieve any person according to both visual and language descriptions. However the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately which limits the applications in the real world. This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Our instruct-ReID is a more general ReID setting where existing 6 ReID tasks can be viewed as special cases by designing different instructions. We propose a large-scale OmniReID benchmark and an adaptive triplet loss as a baseline method to facilitate research in this new setting. Experimental results show that the proposed multi-purpose ReID model trained on our OmniReID benchmark without finetuning can improve +0.5% +0.6% +7.7% mAP on Market1501 MSMT17 CUHK03 for traditional ReID +6.4% +7.1% +11.2% mAP on PRCC VC-Clothes LTCC for clothes-changing ReID +11.7% mAP on COCAS+ real2 for clothes template based clothes-changing ReID when using only RGB images +24.9% mAP on COCAS+ real2 for our newly defined language-instructed ReID +4.3% on LLCM for visible-infrared ReID +2.6% on CUHK-PEDES for text-to-image ReID. The datasets the model and code are available at https://github.com/hwz-zju/Instruct-ReID.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — instruction-based retrieval

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Weizhen He , Yiheng Deng , SHIXIANG TANG , Qihao Chen , Qingsong Xie , Yizhou Wang , LEI BAI , Feng Zhu , Rui Zhao , Wanli Ouyang , Donglian Qi , Yunfeng Yan

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Metric Learning Machine Learning > Learning Types > Zero-Shot Learning Computer Vision > Analysis > Person Re-Identification Artificial Intelligence > Core AI > Computer Vision Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Multi-Task Learning Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

multi-modal learning person re-identification triplet loss text-to-image retrieval language instruction instruction-based retrieval adaptive triplet loss language-instructed reid text-to-image re-identification visible-infrared re-identification multi-purpose model

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024