2026 EACL EACL 2026

Cross-Lingual and Cross-Domain Transfer Learning for POS Tagging in Historical Germanic Low-Resource Languages

Abstract

AbstractAlthough Part-of-Speech (POS) tagging has been widely studied, it still presents several challenges, particularly reduced performance on out-of-domain data. While increasing in-domain training data can be effective, this strategy is often impractical in historical low-resource settings. Cross-lingual transfer learning has shown promise for low-resource languages; however, its impact on domain generalization has received limited attention and may remain insufficient when used in isolation. This study focuses on cross-lingual and cross-domain transfer learning for POS tagging on four historical Germanic low-resource languages in two literary genres. For each language, POS tagged data were extracted and mapped to the Universal Dependencies UPOS tag set to establish a monolingual baseline and train three multilingual models in two dataset configurations. The results were consistent with previous findings, indicating that structural differences between the genres can negatively influence transfer learning. The poetry-only multilingual model showed improvements within that domain compared to the baseline. In contrast, multilingual models trained with all available data had lower performance caused by substantial structural differences in the corpora. This study underlines the importance of investigating the domain-generalization abilities of the models, which may be negatively influenced by substantial structural differences between data. In addition, it sheds light on the study of historical low-resource languages.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio