LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Muhammad Jehanzeb Mirza; Leonid Karlinsky; Wei LIN; Horst Possegger; Mateusz Kozinski; Rogerio Feris; Horst Bischof

2023 NIPS NeurIPS 2023

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Abstract

Recently, large-scale pre-trained Vision and Language (VL) models have set a new state-of-the-art (SOTA) in zero-shot visual classification enabling open-vocabulary recognition of potentially unlimited set of categories defined as simple language prompts. However, despite these great advances, the performance of these zero-shot classifiers still falls short of the results of dedicated (closed category set) classifiers trained with supervised fine-tuning. In this paper we show, for the first time, how to reduce this gap without any labels and without any paired VL data, using an unlabeled image collection and a set of texts auto-generated using a Large Language Model (LLM) describing the categories of interest and effectively substituting labeled visual instances of those categories. Using our label-free approach, we are able to attain significant performance improvements over the zero-shot performance of the base VL model and other contemporary methods and baselines on a wide variety of datasets, demonstrating absolute improvement of up to $11.7\%$ ($3.8\%$ on average) in the label-free setting. Moreover, despite our approach being label-free, we observe $1.3\%$ average gains over leading few-shot prompting baselines that do use 5-shot supervision.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — language prompt

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Muhammad Jehanzeb Mirza , Leonid Karlinsky , Wei LIN , Horst Possegger , Mateusz Kozinski , Rogerio Feris , Horst Bischof

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Learning Types > Unsupervised Learning Artificial Intelligence > Learning Paradigms > Zero-Shot Learning

Keywords

zero-shot learning vision language model language prompt open-vocabulary classification unlabeled image

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023