Document Informed Neural Autoregressive Topic Models with Distributional Prior

Pankaj Gupta; Yatin Chaudhary; Florian Buettner; Hinrich Schütze

2019 AAAI AAAI 2019

Document Informed Neural Autoregressive Topic Models with Distributional Prior

Abstract

Abstract We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., “networks” used in the contexts artificial neural networks vs. biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADEe and iDocNADEe. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 7 long-text and 8 short-text datasets from diverse domains.

🚀 Conference Pioneer — AAAI 2019

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Applications

🧭 Keyword Pioneer — distributional prior

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Pankaj Gupta , Yatin Chaudhary , Florian Buettner , Hinrich Schütze

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Models > Generative Models Deep Learning > Models > Variational Inference Natural Language Processing > Applications Natural Language Processing > Applications > Text Classification Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Deep Learning > Models > Neural Networks Machine Learning > Core Methods > Topic Modeling

Keywords

variational inference text classification document modeling topic coherence neural autoregressive model topic model document embedding distributional prior neural autoregressive topic model

Download PDF

Related papers

Cooperative Multimodal Approach to Depression Detection in Twitter 2019

Learning to Align Question and Answer Utterances in Customer Service Conversation with Recurrent Pointer Networks 2019

Community Detection in Social Networks Considering Topic Correlations 2019

Session-Based Recommendation with Graph Neural Networks 2019

Blameworthiness in Multi-Agent Settings 2019