A multilabel approach to morphosyntactic probing

Naomi Shapiro; Amandalynne Paullada; Shane Steinert-Threlkeld

2021 EMNLP EMNLP 2021

A multilabel approach to morphosyntactic probing

Abstract

AbstractWe propose using a multilabel probing task to assess the morphosyntactic representations of multilingual word embeddings. This tweak on canonical probing makes it easy to explore morphosyntactic representations, both holistically and at the level of individual features (e.g., gender, number, case), and leads more naturally to the study of how language models handle co-occurring features (e.g., agreement phenomena). We demonstrate this task with multilingual BERT (Devlin et al., 2018), training probes for seven typologically diverse languages: Afrikaans, Croatian, Finnish, Hebrew, Korean, Spanish, and Turkish. Through this simple but robust paradigm, we verify that multilingual BERT renders many morphosyntactic features simultaneously extractable. We further evaluate the probes on six held-out languages: Arabic, Chinese, Marathi, Slovenian, Tagalog, and Yoruba. This zero-shot style of probing has the added benefit of revealing which cross-linguistic properties a language model recognizes as being shared by multiple languages.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — zero-shot probing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Naomi Shapiro , Amandalynne Paullada , Shane Steinert-Threlkeld

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Learning Theory Natural Language Processing > Resources & Methods > Multilingual NLP Interdisciplinary > Linguistics > Computational Linguistics Deep Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Natural Language Processing

Keywords

feature extraction cross-linguistic analysis word embedding multilingual bert multilingual embedding zero-shot probing morphosyntactic probing

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021