Precision-Recall Balanced Topic Modelling

Seppo Virtanen; Mark Girolami

2019 NIPS NeurIPS 2019

Precision-Recall Balanced Topic Modelling

Abstract

Topic models are becoming increasingly relevant probabilistic models for dimensionality reduction of text data, inferring topics that capture meaningful themes of frequently co-occurring terms. We formulate topic modelling as an information retrieval task, where the goal is, based on the latent topic representation, to capture relevant term co-occurrence patterns. We evaluate performance for this task rigorously with regard to two types of errors, false negatives and positives, based on the well-known precision-recall trade-off and provide a statistical model that allows the user to balance between the contributions of the different error types. When the user focuses solely on the contribution of false negatives ignoring false positives altogether our proposed model reduces to a standard topic model. Extensive experiments demonstrate the proposed approach is effective and infers more coherent topics than existing related approaches.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — precision-recall trade-off

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Seppo Virtanen , Mark Girolami

Topics

Data Science & Analytics > Methods > Data Mining Machine Learning > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Core Methods > Topic Modeling Natural Language Processing > Applications > Topic Modeling

Keywords

dimensionality reduction probabilistic modeling topic modeling information retrieval probabilistic model topic model precision-recall trade-off

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019