2025 CONLL CoNLL 2025

Vorm: Translations and a constrained hypothesis space support unsupervised morphological segmentation across languages

Abstract

AbstractThis paper introduces Vorm, an unsupervised morphological segmentation system, leveraging translation data to infer highly accurate morphological transformations, including less-frequently modeled processes such as infixation and reduplication. The system is evaluated on standard benchmark data and a novel, typologically diverse, dataset of 37 languages. Model performance is competitive and sometimes superior on canonical segmentation, but more limited on surface segmentation.

🌉 Interdisciplinary Bridge — Interdisciplinary and Natural Language Processing
🧭 Keyword Pioneer — typological diversity
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio