Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings

Haozhou Wang; James Henderson; Paola Merlo

2019 IJCNLP IJCNLP 2019

Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings

Abstract

AbstractDistributed representations of words which map each word to a continuous vector have proven useful in capturing important linguistic information not only in a single language but also across different languages. Current unsupervised adversarial approaches show that it is possible to build a mapping matrix that aligns two sets of monolingual word embeddings without high quality parallel data, such as a dictionary or a sentence-aligned corpus. However, without an additional step of refinement, the preliminary mapping learnt by these methods is unsatisfactory, leading to poor performance for typologically distant languages. In this paper, we propose a weakly-supervised adversarial training method to overcome this limitation, based on the intuition that mapping across languages is better done at the concept level than at the word level. We propose a concept-based adversarial training method which improves the performance of previous unsupervised adversarial methods for most languages, and especially for typologically distant language pairs.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — concept alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haozhou Wang , James Henderson , Paola Merlo

Topics

Machine Learning > Learning Types > Adversarial Learning Machine Learning > Learning Types > Weakly Supervised Learning Natural Language Processing > Resources & Methods > Lexical Semantics

Keywords

adversarial training concept alignment cross-lingual word embedding word embedding mapping typologically distant language

Download PDF

Related papers

Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation 2019

Exploiting Monolingual Data at Scale for Neural Machine Translation 2019

Distributionally Robust Language Modeling 2019

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling 2019

ARAML: A Stable Adversarial Training Framework for Text Generation 2019