Extending Automatic Machine Translation Evaluation to Book-Length Documents

Kuang-Da Wang; Shuoyang Ding; Chao-Han Huck Yang; Ping-Chun Hsieh; Wen-Chih Peng; Vitaly Lavrukhin; Boris Ginsburg

2025 EMNLP EMNLP 2025

Extending Automatic Machine Translation Evaluation to Book-Length Documents

Abstract

AbstractDespite Large Language Models (LLMs) demonstrating superior translation performance and long-context capabilities, evaluation methodologies remain constrained to sentence-level assessment due to dataset limitations, token number restrictions in metrics, and rigid sentence boundary requirements. We introduce SEGALE, an evaluation scheme that extends existing automatic metrics to long-document translation by treating documents as continuous text and applying sentence segmentation and alignment methods. Our approach enables previously unattainable document-level evaluation, handling translations of arbitrary length generated with document-level prompts while accounting for under-/over-translations and varied sentence boundaries. Experiments show our scheme significantly outperforms existing long-form document evaluation schemes, while being comparable to evaluations performed with groundtruth sentence alignments. Additionally, we apply our scheme to book-length texts and newly demonstrate that many open-weight LLMs fail to effectively translate documents at their reported maximum context lengths.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — long-document translation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Kuang-Da Wang , Shuoyang Ding , Chao-Han Huck Yang , Ping-Chun Hsieh , Wen-Chih Peng , Vitaly Lavrukhin , Boris Ginsburg

Topics

Natural Language Processing > Applications > Machine Translation Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Evaluation Deep Learning > Learning Types > Transfer Learning

Keywords

machine translation translation quality sentence alignment document-level evaluation long-document translation book-length translation

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025