Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Shijie Wu; Mark Dredze

2019 EMNLP EMNLP 2019

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Abstract

AbstractPretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language specific features, and measure factors that influence cross-lingual transfer.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — zero-shot language transfer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shijie Wu , Mark Dredze

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Zero-Shot Learning Natural Language Processing > Applications > Machine Translation Natural Language Processing > Resources & Methods > Multilingual NLP Deep Learning > Learning Types > Transfer Learning

Keywords

zero-shot learning cross-lingual transfer named entity recognition dependency parsing part-of-speech tagging contextual representation multilingual bert zero-shot language transfer

Download PDF

Related papers

Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation 2019

Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference 2019

A Boundary-aware Neural Model for Nested Named Entity Recognition 2019

Iterative Dual Domain Adaptation for Neural Machine Translation 2019

A Multi-Pairwise Extension of Procrustes Analysis for Multilingual Word Translation 2019