Individuation in Neural Models with and without Visual Grounding

Alexey Tikhonov; Lisa Bylinina; Ivan P. Yamshchikov

2024 EMNLP EMNLP 2024

Individuation in Neural Models with and without Visual Grounding

Abstract

AbstractWe show differences between a language-and-vision model CLIP and two text-only models — FastText and SBERT — when it comes to the encoding of individuation information. We study latent representations that CLIP provides for substrates, granular aggregates, and various numbers of objects. We demonstrate that CLIP embeddings capture quantitative differences in individuation better than models trained on text-only data. Moreover, the individuation hierarchy we deduce from the CLIP embeddings agrees with the hierarchies proposed in linguistics and cognitive science.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — text-only model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexey Tikhonov , Lisa Bylinina , Ivan P. Yamshchikov

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Core Methods > Embedding Learning Computer Vision > Core AI > Multimodal Learning Deep Learning > Models > Transformers Deep Learning > Learning Types > Representation Learning Deep Learning > Learning Types > Multi-Modal Learning

Keywords

representation learning embedding learning multimodal learning visual grounding language model text embedding clip embedding text-only model

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024