2024 EMNLP EMNLP 2024

Verba volant, scripta volant? Don’t worry! There are computational solutions for protoword reconstruction

Abstract

AbstractWe introduce a new database of cognate words and etymons for the five main Romance languages, the most comprehensive one to date. We propose a strong benchmark for the automatic reconstruction of protowords for Romance languages, by applying a set of machine learning models and features on these data. The best results reach 90% accuracy in predicting the protoword of a given cognate set, surpassing existing state-of-the-art results for this task and showing that computational methods can be very useful in assisting linguists with protoword reconstruction.

The Questioner
🌉 Interdisciplinary Bridge — Artificial Intelligence and Data Science & Analytics and Interdisciplinary and Machine Learning and Natural Language Processing
🧭 Keyword Pioneer — protoword reconstruction
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio