Bayesian Text Classification and Summarization via A Class-Specified Topic Model

Feifei Wang; Junni L. Zhang; Yichao Li; Ke Deng; Jun S. Liu

2021 JMLR JMLR 2021

Bayesian Text Classification and Summarization via A Class-Specified Topic Model

Abstract

We propose the class-specified topic model (CSTM) to deal with the tasks of text classification and class-specific text summarization. The model assumes that in addition to a set of latent topics that are shared across classes, there is a set of class-specific latent topics for each class. Each document is a probabilistic mixture of the class-specific topics associated with its class and the shared topics. Each class-specific or shared topic has its own probability distribution over a given dictionary. We develop a Bayesian inference of CSTM in the semisupervised scenario, with the supervised scenario as a special case. We analyze in detail the 20 Newsgroups dataset, a benchmark dataset for text classification, and demonstrate that CSTM has better performance than a two stage approach based on latent Dirichlet allocation (LDA), several existing supervised extensions of LDA, and an $L^1$ penalized logistic regression. The favorable performance of CSTM is also demonstrated through Monte Carlo simulations and an analysis of the Reuters dataset. [abs] [ pdf ][ bib ] © JMLR 2021. (edit, beta)

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Feifei Wang , Junni L. Zhang , Yichao Li , Ke Deng , Jun S. Liu

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Semi-Supervised Learning Natural Language Processing > Applications > Text Classification Machine Learning > Bayesian & Probabilistic > Bayesian Inference Natural Language Processing > Applications > Summarization

Keywords

semi-supervised learning bayesian inference text classification latent dirichlet allocation text summarization topic model

Download PDF

Related papers

Optimal Feedback Law Recovery by Gradient-Augmented Sparse Polynomial Regression 2021

Normalizing Flows for Probabilistic Modeling and Inference 2021

Determining the Number of Communities in Degree-corrected Stochastic Block Models 2021

Guided Visual Exploration of Relations in Data Sets 2021

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach 2021