2022
NAACL
NAACL 2022
Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics
Abstract
AbstractRecent work incorporates pre-trained word embeddings such as BERT embeddings into Neural Topic Models (NTMs), generating highly coherent topics. However, with high-quality contextualized document representations, do we really need sophisticated neural models to obtain coherent and interpretable topics? In this paper, we conduct thorough experiments showing that directly clustering high-quality sentence embeddings with an appropriate word selecting method can generate more coherent and diverse topics than NTMs, achieving also higher efficiency and simplicity.
❓
The Questioner
🌉
Interdisciplinary Bridge
— Data Science & Analytics and Machine Learning and Natural Language Processing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Clustering
Machine Learning > Core Methods > Representation Learning
Natural Language Processing > Generation > Language Modeling
Natural Language Processing > Applications > Text Classification
Data Science & Analytics > Applications > Clustering
Machine Learning > Learning Types > Representation Learning