2024 COLING COLING 2024

PaReNT (Parent Retrieval Neural Tool): A Deep Dive into Word Formation across Languages

Abstract

AbstractWe present PaReNT (Parent Retrieval Neural Tool), a deep-learning-based multilingual tool performing retrieval and word formation classification in English, German, Dutch, Spanish, French, Russian, and Czech. Parent retrieval refers to determining the lexeme or lexemes the input lexeme was based on (e.g. “darkness” is traced back to “dark”; “waterfall” decomposes into “water” and “fall”). Additionally, PaReNT performs word formation classification, which determines the input lexeme as a compound e.g. “proofread”, a derivative (e.g. “deescalate”) or as an unmotivated word (e.g. “dog”). These seven languages are selected from three major branches of the Indo-European language family (Germanic, Romance, Slavic). Data is aggregated from a range of word-formation resources, as well as Wiktionary, to train and test the tool. The tool is based on a custom-architecture hybrid transformer block-enriched sequence-to-sequence neural network utilizing both a character-based and semantic representation of the input lexemes, with two output modules - one decoder-based dedicated to parent retrieval, and one classifier-based for word formation classification. PaReNT achieves a mean accuracy of 0.62 in parent retrieval and a mean balanced accuracy of 0.74 in word formation classification.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning
🧭 Keyword Pioneer — parent retrieval
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio