2024
EMNLP
EMNLP 2024
Verba volant, scripta volant? Don’t worry! There are computational solutions for protoword reconstruction
Abstract
AbstractWe introduce a new database of cognate words and etymons for the five main Romance languages, the most comprehensive one to date. We propose a strong benchmark for the automatic reconstruction of protowords for Romance languages, by applying a set of machine learning models and features on these data. The best results reach 90% accuracy in predicting the protoword of a given cognate set, surpassing existing state-of-the-art results for this task and showing that computational methods can be very useful in assisting linguists with protoword reconstruction.
❓
The Questioner
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Data Science & Analytics and Interdisciplinary and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— protoword reconstruction
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Foundation Models
Machine Learning > Core Methods > Classification
Machine Learning > Core Methods > Regression
Natural Language Processing > Applications > Text Classification
Data Science & Analytics > Methods > Data Mining
Interdisciplinary > Linguistics > Computational Linguistics
Machine Learning > Learning Types > Classification