On the differences between BERT and MT encoder spaces and how to address them in translation tasks

Raúl Vázquez; Hande Celikkanat; Mathias Creutz; Jörg Tiedemann

2021 ACL ACL 2021

On the differences between BERT and MT encoder spaces and how to address them in translation tasks

Abstract

AbstractVarious studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks. This is even more astonishing considering the similarities between the architectures. This paper sheds some light on the embedding spaces they create, using average cosine similarity, contextuality metrics and measures for representational similarity for comparison, revealing that BERT and NMT encoder representations look significantly different from one another. In order to address this issue, we propose a supervised transformation from one into the other using explicit alignment and fine-tuning. Our results demonstrate the need for such a transformation to improve the applicability of BERT in MT.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Raúl Vázquez , Hande Celikkanat , Mathias Creutz , Jörg Tiedemann

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Core Methods > Representation Learning Natural Language Processing > Applications > Machine Translation Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Generation > Machine Translation Deep Learning > Learning Types > Transfer Learning

Keywords

representation learning transfer learning neural machine translation pretrained language model encoder representation

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021