2025
EMNLP
EMNLP 2025
Statistical and Neural Methods for Hawaiian Orthography Modernization
Abstract
AbstractHawaiian orthography employs two distinct spelling systems, both of which are used by communities of speakers today. These two spelling systems are distinguished by the presence of the ‘okina letter and kahakō diacritic, which represent glottal stops and long vowels, respectively. We develop several models ranging in complexity to convert between these two orthographies. Our results demonstrate that simple statistical n-gram models surprisingly outperform neural seq2seq models and LLMs, highlighting the potential for traditional machine learning approaches in a low-resource setting.
🌉
Interdisciplinary Bridge
— Data Science & Analytics and Deep Learning and Interdisciplinary and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— orthography modernization
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Unsupervised Learning
Machine Learning > Optimization & Theory > Optimization
Data Science & Analytics > Methods > Data Mining
Interdisciplinary > Linguistics > Computational Linguistics
Machine Learning > Learning Types > Supervised Learning
Natural Language Processing > Resources & Methods > Language Modeling
Deep Learning > Models > Large Language Models
Machine Learning > Core Methods > Sequence Modeling