Combining Inherent Knowledge of Vision-Language Models with Unsupervised Domain Adaptation through Strong-Weak Guidance

Thomas Westfechtel; Dexuan Zhang; Tatsuya Harada

2025 WACV WACV 2025

Combining Inherent Knowledge of Vision-Language Models with Unsupervised Domain Adaptation through Strong-Weak Guidance

Abstract

Unsupervised domain adaptation (UDA) tries to overcome the tedious work of labeling data by leveraging a labeled source dataset and transferring its knowledge to a similar but different target dataset. Meanwhile current vision-language models exhibit remarkable zero-shot prediction capabilities. In this work we combine knowledge gained through UDA with the inherent knowledge of vision-language models. We introduce a strong-weak guidance learning scheme that employs zero-shot predictions to help align the source and target dataset. For the strong guidance we expand the source dataset with the most confident samples of the target dataset. Additionally we employ a knowledge distillation loss as weak guidance. The strong guidance uses hard labels but is only applied to the most confident predictions from the target dataset. Conversely the weak guidance is employed to the whole dataset but uses soft labels. The weak guidance is implemented as a knowledge distillation loss with (adjusted) zero-shot predictions. We show that our method complements and benefits from prompt adaptation techniques for vision-language models. We conduct experiments and ablation studies on three benchmarks (OfficeHome VisDA and DomainNet) outperforming state-of-the-art methods. Our ablation studies further demonstrate the contributions of different components of our algorithm.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Thomas Westfechtel , Dexuan Zhang , Tatsuya Harada

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Domain Adaptation Machine Learning > Learning Types > Domain Adaptation Deep Learning > Techniques > Knowledge Distillation Deep Learning > Learning Types > Multi-Modal Learning

Keywords

zero-shot learning domain generalization knowledge distillation unsupervised domain adaptation vision-language model zero-shot prediction prompt adaptation

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025