2025 NAACL NAACL 2025

Restructuring and visualising dialect dictionary data: Report on Erzya and Moksha materials

Abstract

AbstractThere are a number of Uralic dialect dictionaries based on fieldwork documentation of individual minority languages from the Pre-Soviet Era. The first of these published by the Finno-Ugrian Society features the Mordvin languages, Erzya and Moksha.In this article, we describe the possibility of reusing XML dialect dictionary collection point and phonetic variant data for visualizing informative linguistic isoglosses with R programming language’s Shiny web application frame-work.We provide a description of the ‘H. Paasonen Mordvin Dictionary’, which will possibly provide the reader with a better perspective of what data and challenges might present themselves in minority language dialect dictionaries.We provide a description of how we processed our data, and then we provide conclusions followed by a more extensive section on limitations. The conclusions state that only some of the data should be rendered with R Shiny web application, whereas some data might be better rendered by other applications.Our limitations section description calls for the extension the dialect dictionary database for a more concise description of the languageforms.

🧭 Keyword Pioneer — dialect dictionary
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio