Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder

Xiaobao Wu; Chunping Li; Yan Zhu; Yishu Miao

2020 EMNLP EMNLP 2020

Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder

Abstract

AbstractTopic models have been prevailing for many years on discovering latent semantics while modeling long documents. However, for short texts they generally suffer from data sparsity because of extremely limited word co-occurrences; thus tend to yield repetitive or trivial topics with low quality. In this paper, to address this issue, we propose a novel neural topic model in the framework of autoencoding with a new topic distribution quantization approach generating peakier distributions that are more appropriate for modeling short texts. Besides the encoding, to tackle this issue in terms of decoding, we further propose a novel negative sampling decoder learning from negative samples to avoid yielding repetitive topics. We observe that our model can highly improve short text topic modeling performance. Through extensive experiments on real-world datasets, we demonstrate our model can outperform both strong traditional and neural baselines under extreme data sparsity scenes, producing high-quality topics.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — neural topic modeling

🐝 Cross-Pollinator — Computer Vision, Deep Learning, Machine Learning, Natural Language Processing

Authors

Xiaobao Wu , Chunping Li , Yan Zhu , Yishu Miao

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Unsupervised Learning Natural Language Processing > Resources & Methods > Text Representation

Keywords

latent semantics neural topic modeling topic distribution quantization negative sampling decoder autoencoding framework short text modeling

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020