Factorial LDA: Sparse Multi-Dimensional Text Models

Michael Paul; Mark Dredze

2012 NIPS NeurIPS 2012

Factorial LDA: Sparse Multi-Dimensional Text Models

Abstract

Multi-dimensional latent variable models can capture the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional latent variable model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (e.g. methods vs. applications.) Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

📈 Trend Setter — Text Representation

🧭 Keyword Pioneer — multi-dimensional models

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🐣 Hot Topic Early Bird — topic modeling

Authors

Michael Paul , Mark Dredze

Topics

Artificial Intelligence > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Learning Types > Unsupervised Learning Natural Language Processing > Resources & Methods > Text Representation Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Core Methods > Dimensionality Reduction Machine Learning > Core Methods > Probabilistic Modeling Machine Learning > Learning Types > Representation Learning Natural Language Processing > Resources & Methods > Language Modeling Natural Language Processing > Applications > Topic Modeling

Keywords

topic modeling text analysis multi-dimensional models structured word priors sparse factorization latent variable model sparse model topic model text corpus factorial lda text model

Download PDF

Related papers

Kernel Hyperalignment 2012

Fused sparsity and robust estimation for linear models with unknown variance 2012

Slice sampling normalized kernel-weighted completely random measure mixture models 2012

Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization 2012

Matrix reconstruction with the local max norm 2012