2024
EMNLP
EMNLP 2024
Individuation in Neural Models with and without Visual Grounding
Abstract
AbstractWe show differences between a language-and-vision model CLIP and two text-only models — FastText and SBERT — when it comes to the encoding of individuation information. We study latent representations that CLIP provides for substrates, granular aggregates, and various numbers of objects. We demonstrate that CLIP embeddings capture quantitative differences in individuation better than models trained on text-only data. Moreover, the individuation hierarchy we deduce from the CLIP embeddings agrees with the hierarchies proposed in linguistics and cognitive science.
🌉
Interdisciplinary Bridge
— Computer Vision and Deep Learning and Machine Learning
🧭
Keyword Pioneer
— text-only model
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Representation Learning
Machine Learning > Core Methods > Embedding Learning
Computer Vision > Core AI > Multimodal Learning
Deep Learning > Models > Transformers
Deep Learning > Learning Types > Representation Learning
Deep Learning > Learning Types > Multi-Modal Learning