Learning Document Representations Using Subspace Multinomial Model

Santosh Kesiraju; Lukas Burget; Igor Szőke; Jan Černocký

2016 INTERSPEECH INTERSPEECH 2016

Learning Document Representations Using Subspace Multinomial Model

Abstract

Subspace multinomial model (SMM) is a log-linear model and can be used for learning low dimensional continuous representation for discrete data. SMM and its variants have been used for speaker verification based on prosodic features and phonotactic language recognition. In this paper, we propose a new variant of SMM that introduces sparsity and call the resulting model as ℓ1 SMM. We show that ℓ1 SMM can be used for learning document representations that are helpful in topic identification or classification and clustering tasks. Our experiments in document classification show that SMM achieves comparable results to models such as latent Dirichlet allocation and sparse topical coding, while having a useful property that the resulting document vectors are Gaussian distributed.

🚀 Conference Pioneer — INTERSPEECH 2016

🧭 Keyword Pioneer — topic identification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio