Domain Prompt Learning with Quaternion Networks

Qinglong Cao; Zhengqin Xu; Yuntian Chen; Chao Ma; Xiaokang Yang

2024 CVPR CVPR 2024

Domain Prompt Learning with Quaternion Networks

Abstract

Prompt learning has emerged as an effective and data-efficient technique in large Vision-Language Models (VLMs). However when adapting VLMs to specialized domains such as remote sensing and medical imaging domain prompt learning remains underexplored. While large-scale domain-specific foundation models can help tackle this challenge their concentration on a single vision level makes it challenging to prompt both vision and language modalities. To overcome this we propose to leverage domain-specific knowledge from domain-specific foundation models to transfer the robust recognition ability of VLMs from generalized to specialized domains using quaternion networks. Specifically the proposed method involves using domain-specific vision features from domain-specific foundation models to guide the transformation of generalized contextual embeddings from the language branch into a specialized space within the quaternion networks. Moreover we present a hierarchical approach that generates vision prompt features by analyzing intermodal relationships between hierarchical language prompt features and domain-specific vision features. In this way quaternion networks can effectively mine the intermodal relationships in the specific domain facilitating domain-specific vision-language contrastive learning. Extensive experiments on domain-specific datasets show that our proposed method achieves new state-of-the-art results in prompt learning.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — quaternion network

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qinglong Cao , Zhengqin Xu , Yuntian Chen , Chao Ma , Xiaokang Yang

Topics

Machine Learning > Application Areas > Domain Adaptation Deep Learning > Architectures > Transformers Computer Vision > Domain-Specific > Medical Imaging Computer Vision > Domain-Specific > Remote Sensing Machine Learning > Learning Paradigms > Transfer Learning Computer Vision > Core AI > Multimodal Learning Deep Learning > Models > Foundation Models Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Techniques > Fine-Tuning Deep Learning > Learning Types > Domain Adaptation

Keywords

contrastive learning transfer learning domain adaptation prompt learning foundation model vision-language model quaternion network domain-specific foundation model

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024