2025 COLING COLING 2025

Cognate Detection for Historical Language Reconstruction of Proto-Sabean Languages: the Case of Ge’ez, Tigrinya, and Amharic

Abstract

AbstractAs languages evolve, we risk losing ancestral languages. In this paper, we explore Historical Language Reconstruction (HLR) for Proto-Sabean languages, starting with the identification of cognates–sets of words in different related languages that are derived from the same ancestral language. We (1) collect semantically related words in three Afro-Semitic languages from a three-way dictionary (2) work with linguists to identify cognates and reconstruct the proto-form of the cognates, (3) experiment with three automatic cognate detection methods and extract cognates from the semantically related words. We then experiment with in-context learning with GPT-4o to generate the proto-language from the cognates and use Sequence-to-Sequence (Seq2Seq) models for HLR.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning
🧭 Keyword Pioneer — historical language reconstruction
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio