Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP

Zeliang Zhang; Zhuo Liu; Mingqian Feng; Chenliang Xu

2024 EMNLP EMNLP 2024

Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP

Abstract

AbstractCLIP has demonstrated great versatility in adapting to various downstream tasks, such as image editing and generation, visual question answering, and video understanding. However, CLIP-based applications often suffer from misunderstandings regarding user intent, leading to discrepancies between the required number of objects and the actual outputs in image generation tasks. In this work, we empirically investigate the quantity bias in CLIP. By carefully designing different experimental settings and datasets, we comprehensively evaluate CLIP’s understanding of quantity from text, image, and cross-modal perspectives. Our experimental results reveal a quantity bias in CLIP embeddings, impacting the reliability of downstream tasks.

❓ The Questioner

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — quantity bia

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Zeliang Zhang , Zhuo Liu , Mingqian Feng , Chenliang Xu

Topics

Machine Learning > Optimization & Theory > Theory Deep Learning > Architectures > Transformers

Keywords

empirical evaluation cross-modal embedding clip embedding text-image alignment quantity bia

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024