2025
NAACL
NAACL 2025
Identifying and analyzing ‘noisy’ spelling errors in a second language corpus
Abstract
AbstractThis paper addresses the problem of identifying and analyzing ‘noisy’ spelling errors in texts written by second language (L2) learners’ texts in a written corpus. Using Python, spelling errors were identified in 5774 texts greater than or equal to 66 words (total=1,814,209 words), selected from a corpus of 4.2 million words (Authors-1). The statistical analysis used hurdle() models in R, which are appropriate for non-normal, count data, with many zeros.
🌉
Interdisciplinary Bridge
— Data Science & Analytics and Interdisciplinary
🧭
Keyword Pioneer
— hurdle model
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio