Topic Modeling with Wasserstein Autoencoders

Feng Nan; Ran Ding; Ramesh Nallapati; Bing Xiang

2019 ACL ACL 2019

Topic Modeling with Wasserstein Autoencoders

Abstract

AbstractWe propose a novel neural topic model in the Wasserstein autoencoders (WAE) framework. Unlike existing variational autoencoder based models, we directly enforce Dirichlet prior on the latent document-topic vectors. We exploit the structure of the latent space and apply a suitable kernel in minimizing the Maximum Mean Discrepancy (MMD) to perform distribution matching. We discover that MMD performs much better than the Generative Adversarial Network (GAN) in matching high dimensional Dirichlet distribution. We further discover that incorporating randomness in the encoder output during training leads to significantly more coherent topics. To measure the diversity of the produced topics, we propose a simple topic uniqueness metric. Together with the widely used coherence measure NPMI, we offer a more wholistic evaluation of topic quality. Experiments on several real datasets show that our model produces significantly better topics than existing topic models.

🧭 Keyword Pioneer — neural topic model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — maximum mean discrepancy

Authors

Feng Nan , Ran Ding , Ramesh Nallapati , Bing Xiang

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Unsupervised Learning Deep Learning > Architectures > Autoencoders Deep Learning > Models > Variational Inference Machine Learning > Core Methods > Probabilistic Modeling Machine Learning > Bayesian & Probabilistic > Variational Inference Machine Learning > Core Methods > Topic Modeling

Keywords

variational inference maximum mean discrepancy topic coherence topic model neural topic model dirichlet prior wasserstein autoencoder

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019