Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning

Hairui Ren; Fan Tang; He Zhao; Zixuan Wang; Dandan Guo; Yi Chang

2025 CVPR CVPR 2025

Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning

Abstract

Fine-tuning vision-language models (VLMs) with large amounts of unlabeled data has recently garnered significant interest. However, a key challenge remains the lack of high-quality pseudo-labeled data. Current pseudo-labeling strategies often struggle with mismatches between semantic and visual information, leading to sub-optimal performance of unsupervised prompt learning (UPL) methods.In this paper, we introduce a simple yet effective approach called Augmenting Discriminative Richness via Diffusions (AiR), toward learning a richer discriminating way to represent the class comprehensively and thus facilitate classification.Specifically, our approach includes a pseudo-label generation module that leverages high-fidelity synthetic samples to create an auxiliary classifier, which captures richer visual variation, bridging text-image-pair classification to a more robust image-image-pair classification. Additionally, we exploit the diversity of diffusion-based synthetic samples to enhance prompt learning, providing greater information for semantic-visual alignment.Extensive experiments on five public benchmarks, including RESISC45 and Flowers102, and across three learning paradigms-UL, SSL, and TRZSL-demonstrate that AiR achieves substantial and consistent performance improvements over state-of-the-art unsupervised prompt learning methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — semantic visual alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hairui Ren , Fan Tang , He Zhao , Zixuan Wang , Dandan Guo , Yi Chang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Application Areas > Data Augmentation Deep Learning > Models > Diffusion Models Machine Learning > Learning Paradigms > Self-Supervised Learning

Keywords

unsupervised learning image classification data augmentation prompt learning diffusion model vision language model vision-language model pseudo label generation unsupervised prompt learning semantic visual alignment

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025