Realistic Test-Time Adaptation of Vision-Language Models

Maxime Zanella; Clément Fuchs; Christophe De Vleeschouwer; Ismail Ben Ayed

2025 CVPR CVPR 2025

Realistic Test-Time Adaptation of Vision-Language Models

Abstract

The zero-shot capabilities of Vision-Language Models (VLMs) have been widely leveraged to improve predictive performance. However, previous works on transductive or test-time adaptation (TTA) often make strong assumptions about the data distribution, such as the presence of all classes. Our work challenges these favorable deployment scenarios and introduces a more realistic evaluation framework, including (i) a variable number of effective classes for adaptation within a single batch, and (ii) non-i.i.d. batches of test samples in online adaptation settings. We provide comprehensive evaluations, comparisons, and ablation studies that demonstrate how current transductive or TTA methods for VLMs systematically compromise the models' initial zero-shot robustness across various realistic scenarios, favoring performance gains under advantageous assumptions about the test sample distributions. Furthermore, we introduce StatA, a versatile method that can handle a wide range of deployment scenarios, including those with a variable number of effective classes at test time. Our approach incorporates a novel regularization term designed specifically for VLMs, which acts as a statistical anchor preserving the initial text-encoder knowledge, particularly in low-data regimes. Code available at https://github.com/MaxZanella/StatA.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Maxime Zanella , Clément Fuchs , Christophe De Vleeschouwer , Ismail Ben Ayed

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Application Areas > Domain Adaptation Artificial Intelligence > Learning Paradigms > Zero-Shot Learning Deep Learning > Learning Types > Transfer Learning Deep Learning > Learning Types > Zero-Shot Learning Deep Learning > Learning Types > Domain Adaptation

Keywords

zero-shot learning domain adaptation transductive learning test-time adaptation distribution shift vision-language model zero-shot classification

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025