2010
NIPS
NeurIPS 2010
Word Features for Latent Dirichlet Allocation
Abstract
We extend Latent Dirichlet Allocation (LDA) by explicitly allowing for the encoding of side information in the distribution over words. This results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages. We present experiments on multi-language topic synchronisation where dictionary information is used to bias corresponding words towards similar topics. Results indicate that our model substantially improves topic cohesion when compared to the standard LDA model.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Natural Language Processing
📈
Trend Setter
— Text Representation
🧭
Keyword Pioneer
— multi-language topic
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing
🌱
Topic Pioneer
— Topic Modeling
🐣
Hot Topic Early Bird
— natural language processing
Authors
Topics
Artificial Intelligence > Bayesian & Probabilistic > Probabilistic Modeling
Machine Learning > Learning Types > Unsupervised Learning
Natural Language Processing > Applications > Text Classification
Natural Language Processing > Resources & Methods > Text Representation
Machine Learning > Core Methods > Topic Modeling
Natural Language Processing > Applications > Topic Modeling
Machine Learning > Learning Types > Topic Modeling