2025 COLING COLING 2025

Leveraging Multilingual Models for Robust Grammatical Error Correction Across Low-Resource Languages

Abstract

AbstractGrammatical Error Correction (GEC) is a crucial task in Natural Language Processing (NLP) aimed at improving the quality of user-generated content, particularly for non-native speakers. This paper introduces a novel end-to-end architecture utilizing the M2M100 multilingual transformer model to build a unified GEC system, with a focus on low-resource languages. A synthetic data generation pipeline is proposed, tailored to address language-specific error categories. The system has been implemented for the Spanish language, showing promising results based on evaluations conducted by linguists with expertise in Spanish. Additionally, we present a user analysis that tracks user interactions, revealing an acceptance rate of 88.2%, as reflected by the actions performed by users.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio