Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages

Yulia Otmakhova; Karin Verspoor; Jey Han Lau

2022 NAACL NAACL 2022

Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages

Abstract

AbstractThough recently there have been an increased interest in how pre-trained language models encode different linguistic features, there is still a lack of systematic comparison between languages with different morphology and syntax. In this paper, using BERT as an example of a pre-trained model, we compare how three typologically different languages (English, Korean, and Russian) encode morphology and syntax features across different layers. In particular, we contrast languages which differ in a particular aspect, such as flexibility of word order, head directionality, morphological type, presence of grammatical gender, and morphological richness, across four different tasks.

🧭 Keyword Pioneer — morphological encoding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio