Unify Word-level and Span-level Tasks: NJUNLP’s Participation for the WMT2023 Quality Estimation Shared Task

Xiang Geng; Zhejian Lai; Yu Zhang; Shimin Tao; Hao Yang; Jiajun Chen; Shujian Huang

2023 EMNLP EMNLP 2023

Unify Word-level and Span-level Tasks: NJUNLP’s Participation for the WMT2023 Quality Estimation Shared Task

Abstract

AbstractWe introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe). We generate pseudo MQM data using parallel data from the WMT translation task. We pre-train the XLMR large model on pseudo QE data, then fine-tune it on real QE data. At both stages, we jointly learn sentence-level scores and word-level tags. Empirically, we conduct experiments to find the key hyper-parameters that improve the performance. Technically, we propose a simple method that covert the word-level outputs to fine-grained error span results. Overall, our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks by a considerable margin.

🧭 Keyword Pioneer — fine-grained error

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Xiang Geng , Zhejian Lai , Yu Zhang , Shimin Tao , Hao Yang , Jiajun Chen , Shujian Huang

Topics

Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Applications > Machine Translation

Keywords

machine translation quality estimation word-level prediction error span detection fine-grained error xlmr large model

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023