What Do Language Models Hear? Probing for Auditory Representations in Language Models

Jerry Ngo; Yoon Kim

2024 ACL ACL 2024

What Do Language Models Hear? Probing for Auditory Representations in Language Models

Abstract

AbstractThis work explores whether language models encode meaningfully grounded representations of sounds of objects. We learn a linear probe that retrieves the correct text representation of an object given a snippet of audio related to that object, where the sound representation is given by a pretrained audio model. This probe is trained via a contrastive loss that pushes the language representations and sound representations of an object to be close to one another. After training, the probe is tested on its ability to generalize to objects that were not seen during training. Across different language models and audio models, we find that the probe generalization is above chance in many cases, indicating that despite being trained only on raw text, language models encode grounded knowledge of sounds for some objects.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jerry Ngo , Yoon Kim

Topics

Machine Learning > Learning Types > Contrastive Learning Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Large Language Models Deep Learning > Techniques > Self-Supervised Learning Deep Learning > Learning Types > Contrastive Learning

Keywords

contrastive learning cross-modal learning linear probing contrastive loss audio representation large language model linear probe auditory representation grounded knowledge grounded representation

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024