Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral

António Farinhas; Nuno M Guerreiro; Sweta Agrawal; Ricardo Rei; Andre Martins

2025 EMNLP EMNLP 2025

Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral

Abstract

AbstractLarger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimation (QE) metrics as deferral rules. We show that QE-based deferral allows a cascaded system to match the performance of a larger model while invoking it for a small fraction (30% to 50%) of the examples, significantly reducing computational costs. We validate this approach through both automatic and human evaluation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — model deferral

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

António Farinhas , Nuno M Guerreiro , Sweta Agrawal , Ricardo Rei , Andre Martins

Topics

Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Applications > Machine Translation Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Generation > Machine Translation Deep Learning > Optimization & Theory > Efficient Computing

Keywords

machine translation quality estimation computational efficiency computational cost cascaded system cascaded translation computational cost reduction model deferral

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025