A Novel Two-Step Method for Cross Language Representation Learning

Min Xiao; Yuhong Guo

2013 NIPS NeurIPS 2013

A Novel Two-Step Method for Cross Language Representation Learning

Abstract

Cross language text classiﬁcation is an important learning task in natural language processing. A critical challenge of cross language learning lies in that words of different languages are in disjoint feature spaces. In this paper, we propose a two-step representation learning method to bridge the feature spaces of different languages by exploiting a set of parallel bilingual documents. Speciﬁcally, we ﬁrst formulate a matrix completion problem to produce a complete parallel document-term matrix for all documents in two languages, and then induce a cross-lingual document representation by applying latent semantic indexing on the obtained matrix. We use a projected gradient descent algorithm to solve the formulated matrix completion problem with convergence guarantees. The proposed approach is evaluated by conducting a set of experiments with cross language sentiment classiﬁcation tasks on Amazon product reviews. The experimental results demonstrate that the proposed learning approach outperforms a number of comparison cross language representation learning methods, especially when the number of parallel bilingual documents is small.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

📈 Trend Setter — Multilingual NLP

🧭 Keyword Pioneer — cross-lingual representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

🐣 Hot Topic Early Bird — text classification

Authors

Min Xiao , Yuhong Guo

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Core Methods > Representation Learning Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Multilingual NLP Natural Language Processing > Resources & Methods > Text Representation Machine Learning > Learning Types > Transfer Learning Machine Learning > Core Methods > Matrix Factorization Deep Learning > Learning Types > Representation Learning

Keywords

representation learning text classification cross-lingual representation matrix completion cross language representation learning latent semantic indexing cross-lingual document representation bilingual text classification cross language classification parallel bilingual document cross language learning

Download PDF

Related papers

Latent Structured Active Learning 2013

On Flat versus Hierarchical Classification in Large-Scale Taxonomies 2013

Generalized Method-of-Moments for Rank Aggregation 2013

Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent 2013