2009 NIPS NeurIPS 2009

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Abstract

We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the ``topics). In the sparse topic model (STM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the STM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the STM on four real-world datasets. Compared to traditional approaches, the empirical results show that STMs give better predictive performance with simpler inferred models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing
📈 Trend Setter — Text Representation
🧭 Keyword Pioneer — sparse topic model
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing
🐣 Hot Topic Early Bird — sparse coding