2023 EMNLP EMNLP 2023

Working Towards Digital Documentation of Uralic Languages With Open-Source Tools and Modern NLP Methods

Abstract

AbstractWe present our work towards building an infrastructure for documenting endangered languages with the focus on Uralic languages in particular. Our infrastructure consists of tools to write dictionaries so that entries are structured in XML format. These dictionaries are the foundation for rule-based NLP tools such as FSTs. We also work actively towards enhancing these dictionaries and tools by using the latest state-of-the-art neural models by generating training data through rules and lexica

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Interdisciplinary and Natural Language Processing
🧭 Keyword Pioneer — rule-based nlp
🐣 Hot Topic Early Bird — endangered language
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio