Retrieval-enriched zero-shot image classification in low-resource domains

Nicola Dall’Asen; Yiming Wang; Enrico Fini; Elisa Ricci

2024 EMNLP EMNLP 2024

Retrieval-enriched zero-shot image classification in low-resource domains

Abstract

AbstractLow-resource domains, characterized by scarce data and annotations, present significant challenges for language and visual understanding tasks, with the latter much under-explored in the literature. Recent advancements in Vision-Language Models (VLM) have shown promising results in high-resource domains but fall short in low-resource concepts that are under-represented (e.g. only a handful of images per category) in the pre-training set. We tackle the challenging task of zero-shot low-resource image classification from a novel perspective. By leveraging a retrieval-based strategy, we achieve this in a training-free fashion. Specifically, our method, named CoRE (Combination of Retrieval Enrichment), enriches the representation of both query images and class prototypes by retrieving relevant textual information from large web-crawled databases. This retrieval-based enrichment significantly boosts classification performance by incorporating the broader contextual information relevant to the specific class. We validate our method on a newly established benchmark covering diverse low-resource domains, including medical imaging, rare plants, and circuits. Our experiments demonstrate that CoRE outperforms existing state-of-the-art methods that rely on synthetic data generation and model fine-tuning.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — retrieval-augmented classification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nicola Dall’Asen , Yiming Wang , Enrico Fini , Elisa Ricci

Topics

Machine Learning > Learning Types > Zero-Shot Learning Machine Learning > Application Areas > Domain Adaptation Computer Vision > Analysis > Object Detection Computer Vision > Domain-Specific > Medical Imaging Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Zero-Shot Learning Deep Learning > Learning Types > Retrieval-Augmented Generation Deep Learning > Models > Vision-Language Models

Keywords

image classification zero-shot learning low-resource learning zero-shot image classification vision-language model retrieval-augmented generation low-resource domain retrieval-augmented classification retrieval enrichment web-crawled database representation enrichment

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024