Cross-Modal Feature Alignment and MMD Improve Robustness of Prompt Tuning

Jingchen Sun; Rohan Sharma; Vishnu Lokhande; Changyou Chen

2025 WACV WACV 2025

Cross-Modal Feature Alignment and MMD Improve Robustness of Prompt Tuning

Abstract

Prompt Tuning has emerged as a prominent research paradigm for adapting vision-language models to various downstream tasks. However recent research indicates that prompt tuning methods often lead to overfitting due to limited training samples. In this paper we propose a Cross-modal Aligned Feature Tuning (CRAFT) method to address this issue. Cross-modal alignment is conducted by first selecting anchors from the alternative domain and deriving relative representations of the embeddings for the selected anchors. Optimizing for a feature alignment loss over anchor-aligned text and image modalities creates a more unified text-image common space. Overfitting in prompt tuning also deteriorates model performance on out-of-distribution samples. To further improve the prompt model's robustness we propose minimizing Maximum Mean Discrepancy (MMD) over the anchor-aligned feature spaces to mitigate domain shift. The experiment on four different prompt tuning structures consistently shows the improvement of our method with increases of up to 6.1% in the Base-to-Novel generalization task 5.8% in the group robustness task and 2.7% in the out-of-distribution tasks. The code is available at https://github.com/Jingchensun/Craft.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jingchen Sun , Rohan Sharma , Vishnu Lokhande , Changyou Chen

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Domain Generalization

Keywords

domain generalization maximum mean discrepancy cross-modal alignment prompt tuning

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025