Learning a Concept Hierarchy from Multi-labeled Documents

Viet-An Nguyen; Jordan L Ying; Philip Resnik; Jonathan Chang

2014 NIPS NeurIPS 2014

Learning a Concept Hierarchy from Multi-labeled Documents

Abstract

While topic models can discover patterns of word usage in large corpora, it is difficult to meld this unsupervised structure with noisy, human-provided labels, especially when the label space is large. In this paper, we present a model-Label to Hierarchy (L2H)-that can induce a hierarchy of user-generated labels and the topics associated with those labels from a set of multi-labeled documents. The model is robust enough to account for missing labels from untrained, disparate annotators and provide an interpretable summary of an otherwise unwieldy label set. We show empirically the effectiveness of L2H in predicting held-out words and labels for unseen documents.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning and Natural Language Processing

📈 Trend Setter — Text Representation

🐣 Hot Topic Early Bird — unsupervised learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Viet-An Nguyen , Jordan L Ying , Philip Resnik , Jonathan Chang

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Clustering Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Unsupervised Learning Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Text Representation Data Science & Analytics > Methods > Data Mining Machine Learning > Learning Paradigms > Unsupervised Learning Natural Language Processing > Applications > Topic Modeling

Keywords

unsupervised learning probabilistic modeling multi-label classification document clustering hierarchical representation document classification document analysis concept hierarchy topic model hierarchy learning

Download PDF

Related papers

Information-based learning by agents in unbounded state spaces 2014

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm 2014

Partition-wise Linear Models 2014

Active Regression by Stratification 2014

Cone-Constrained Principal Component Analysis 2014