Symmetric Correspondence Topic Models for Multilingual Text Analysis

Kosuke Fukumasu; Koji Eguchi; Eric P. Xing

2012 NIPS NeurIPS 2012

Symmetric Correspondence Topic Models for Multilingual Text Analysis

Abstract

Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be specified in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA. We experimented with two multilingual comparable datasets extracted from Wikipedia and demonstrate that SymCorrLDA is more effective than some other existing multilingual topic models.

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning and Natural Language Processing

📈 Trend Setter — Multilingual NLP

🧭 Keyword Pioneer — wikipedia

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing

🐣 Hot Topic Early Bird — text mining

Authors

Kosuke Fukumasu , Koji Eguchi , Eric P. Xing

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Core Methods > Representation Learning Natural Language Processing > Resources & Methods > Multilingual NLP Interdisciplinary > Linguistics > Computational Linguistics Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Core Methods > Dimensionality Reduction Machine Learning > Core Methods > Probabilistic Modeling Natural Language Processing > Resources & Methods > Language Modeling Natural Language Processing > Applications > Topic Modeling

Keywords

latent dirichlet allocation text mining multilingual nlp topic modeling wikipedia comparable corpora multilingual text correspondence models parallel documents probabilistic model topic model structured prior correspondence analysis correspondence topic model

Download PDF

Related papers

Kernel Hyperalignment 2012

Fused sparsity and robust estimation for linear models with unknown variance 2012

Slice sampling normalized kernel-weighted completely random measure mixture models 2012

Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization 2012

Matrix reconstruction with the local max norm 2012