Computational Methods for the Analysis of Complementizer Variability in Language and Literature: The Case of Hebrew “she-” and “ki”

Avi Shmidman; Aynat Rubinstein

2024 EMNLP EMNLP 2024

Computational Methods for the Analysis of Complementizer Variability in Language and Literature: The Case of Hebrew “she-” and “ki”

Abstract

AbstractWe demonstrate a computational method for analyzing complementizer variability within language and literature, focusing on Hebrew as a test case. The primary complementizers in Hebrew are “she-” and “ki”. We first run a large-scale corpus analysis to determine the relative preference for one or the other of these complementizers given the preceding verb. On top of this foundation, we leverage clustering methods to measure the degree of interchangeability between the complementizers for each verb. The resulting tables, which provide this information for all common complement-taking verbs in Hebrew, are a first-of-its-kind lexical resource which we provide to the NLP community. Upon this foundation, we demonstrate a computational method to analyze literary works for unusual and unexpected complementizer usages deserving of literary analysis.

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — complementizer variation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Avi Shmidman , Aynat Rubinstein

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Applications > Information Retrieval Interdisciplinary > Linguistics > Computational Linguistics Natural Language Processing > Applications > Text Processing

Keywords

natural language processing computational linguistics clustering method lexical resource corpus analysis hebrew language complementizer variation

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024