Logits DeConfusion with CLIP for Few-Shot Learning

Shuo Li; Fang Liu; Zehua Hao; Xinyi Wang; Lingling Li; Xu Liu; Puhua Chen; Wenping Ma

2025 CVPR CVPR 2025

Logits DeConfusion with CLIP for Few-Shot Learning

Abstract

With its powerful visual-language alignment capability, CLIP performs well in zero-shot and few-shot learning tasks. However, we found in experiments that CLIP's logits suffer from serious inter-class confusion problems in downstream tasks, and the ambiguity between categories seriously affects the accuracy. To address this challenge, we propose a novel method called Logits DeConfusion, which effectively learns and eliminates inter-class confusion in logits by combining our Multi-level Adapter Fusion (MAF) module with our Inter-Class Deconfusion (ICD) module. Our MAF extracts features from different levels and fuses them uniformly to enhance feature representation. Our ICD learnably eliminates inter-class confusion in logits with a residual structure. Experimental results show that our method can significantly improve the classification performance and alleviate the inter-class confusion problem. The code is available at https://github.com/LiShuo1001/LDC.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — inter-class confusion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shuo Li , Fang Liu , Zehua Hao , Xinyi Wang , Lingling Li , Xu Liu , Puhua Chen , Wenping Ma

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Learning Paradigms > Few-Shot Learning Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Domain Adaptation Deep Learning > Architectures > Transformers Machine Learning > Learning Paradigms > Few-Shot Learning Machine Learning > Core Methods > Feature Learning Computer Vision > Core AI > Multimodal Learning Deep Learning > Learning Types > Transfer Learning Deep Learning > Learning Types > Few-Shot Learning Deep Learning > Models > Vision-Language Models

Keywords

zero-shot learning few-shot learning transfer learning vision-language model clip model visual-language alignment adapter fusion inter-class confusion multi-level adapter fusion

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025