JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models

Yuncheng Guo; Xiaodong Gu

2024 CVPR CVPR 2024

JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models

Abstract

Leveraging few-shot datasets in prompt learning for Vision-Language Models eliminates the need for manual prompt engineering while highlighting the necessity of accurate annotations for the labels. However high-level or complex label noise challenges prompt learning for Vision-Language Models. Aiming at this issue we propose a new framework for improving its robustness. Specifically we introduce the Joint Adaptive Partitioning for Label Refurbishment (JoAPR) a structured framework encompassing two key steps. 1) Data Partitioning where we differentiate between clean and noisy data using joint adaptive thresholds. 2) Label Refurbishment where we correct the labels based on the partition outcomes before retraining the network. Our comprehensive experiments confirm that JoAPR substantially enhances the robustness of prompt learning for Vision-Language Models against label noise offering a promising direction for future research.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuncheng Guo , Xiaodong Gu

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Optimization & Theory > Optimization Deep Learning > Models > Vision-Language Models Deep Learning > Learning Types > Prompt Engineering

Keywords

prompt learning label noise adaptive partitioning vision-language model

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024