Ensemble Fine-tuned mBERT for Translation Quality Estimation

Shaika Chowdhury; Naouel Baili; Brian Vannah

2021 EMNLP EMNLP 2021

Ensemble Fine-tuned mBERT for Translation Quality Estimation

Abstract

AbstractQuality Estimation (QE) is an important component of the machine translation workflow as it assesses the quality of the translated output without consulting reference translations. In this paper, we discuss our submission to the WMT 2021 QE Shared Task. We participate in Task 2 sentence-level sub-task that challenge participants to predict the HTER score for sentence-level post-editing effort. Our proposed system is an ensemble of multilingual BERT (mBERT)-based regression models, which are generated by fine-tuning on different input settings. It demonstrates comparable performance with respect to the Pearson’s correlation, and beat the baseline system in MAE/ RMSE for several language pairs. In addition, we adapt our system for the zero-shot setting by exploiting target language-relevant language pairs and pseudo-reference translations.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shaika Chowdhury , Naouel Baili , Brian Vannah

Topics

Deep Learning > Architectures > Transformers Natural Language Processing > Applications > Machine Translation Machine Learning > Learning Types > Transfer Learning Machine Learning > Learning Types > Ensemble Learning Deep Learning > Models > Transformers Natural Language Processing > Applications > Quality Estimation

Keywords

zero-shot learning ensemble learning machine translation quality estimation regression model multilingual bert translation pair

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021