Are there identifiable structural parts in the sentence embedding whole?

Vivi Nastase; Paola Merlo

2024 EMNLP EMNLP 2024

Are there identifiable structural parts in the sentence embedding whole?

Abstract

AbstractSentence embeddings from transformer models encode much linguistic information in a fixed-length vector. We investigate whether structural information – specifically, information about chunks and their structural and semantic properties – can be detected in these representations. We use a dataset consisting of sentences with known chunk structure, and two linguistic intelligence datasets, whose solution relies on detecting chunks and their grammatical number, and respectively, their semantic roles. Through an approach involving indirect supervision, and through analyses of the performance on the tasks and of the internal representations built during learning, we show that information about chunks and their properties can be obtained from sentence embeddings.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — chunk detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Vivi Nastase , Paola Merlo

Topics

Machine Learning > Core Methods > Representation Learning Natural Language Processing > Resources & Methods > Text Representation Machine Learning > Learning Types > Representation Learning Natural Language Processing > Resources & Methods > Language Modeling Deep Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Natural Language Processing

Keywords

sentence embedding linguistic structure indirect supervision transformer model structural analysis structural probing chunk detection grammatical number detection semantic role detection chunk structure

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024