Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding

Lifu Huang; Kyunghyun Cho; Boliang Zhang; Heng Ji; Kevin Knight

2018 EMNLP EMNLP 2018

Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding

Abstract

AbstractWe construct a multilingual common semantic space based on distributional semantics, where words from multiple languages are projected into a shared space via which all available resources and knowledge can be shared across multiple languages. Beyond word alignment, we introduce multiple cluster-level alignments and enforce the word clusters to be consistently distributed across multiple languages. We exploit three signals for clustering: (1) neighbor words in the monolingual word embedding space; (2) character-level information; and (3) linguistic properties (e.g., apposition, locative suffix) derived from linguistic structure knowledge bases available for thousands of languages. We introduce a new cluster-consistent correlational neural network to construct the common semantic space by aligning words as well as clusters. Intrinsic evaluation on monolingual and multilingual QVEC tasks shows our approach achieves significantly higher correlation with linguistic features which are extracted from manually crafted lexical resources than state-of-the-art multi-lingual embedding learning methods do. Using low-resource language name tagging as a case study for extrinsic evaluation, our approach achieves up to 14.6% absolute F-score gain over the state of the art on cross-lingual direct transfer. Our approach is also shown to be robust even when the size of bilingual dictionary is small.

🌉 Interdisciplinary Bridge — Deep Learning and Interdisciplinary and Machine Learning and Natural Language Processing

📈 Trend Setter — Multi-Lingual Learning

🧭 Keyword Pioneer — cluster-consistent word embedding

🐣 Hot Topic Early Bird — word alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lifu Huang , Kyunghyun Cho , Boliang Zhang , Heng Ji , Kevin Knight

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Core Methods > Representation Learning Machine Learning > Core Methods > Embedding Learning Natural Language Processing > Resources & Methods > Multilingual NLP Natural Language Processing > Resources & Methods > Text Representation Interdisciplinary > Linguistics > Computational Linguistics Deep Learning > Learning Types > Representation Learning Machine Learning > Learning Types > Multi-Lingual Learning

Keywords

cross-lingual transfer word alignment semantic space low-resource language word embedding multilingual embedding multilingual word embedding cluster-consistent word embedding common semantic space

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018