MTQ-Eval: Multilingual Text Quality Evaluation for Language Models
Abstract
AbstractThe use of large language models (LLMs) for evaluating outputs is becoming an increasingly effective and scalable approach. However, it remains uncertain whether this capability extends beyond task-specific evaluations to more general assessments of text quality, particularly in multilingual contexts. In this study, we introduce – MTQ-Eval – a novel framework for multilingual text quality evaluation. We automatically generate text quality preference data and train open-source base LLMs to align with ratings of high- and low-quality text. Our comprehensive evaluation across 115 languages demonstrates the improved performance of the proposed model. Additionally, we explore whether this enhanced ability to distinguish between high- and low-quality text translates to better performance in downstream tasks.