2020 AAAI AAAI 2020

Verb Class Induction with Partial Supervision

Abstract

Abstract Dirichlet-multinomial (D-M) mixtures like latent Dirichlet allocation (LDA) are widely used for both topic modeling and clustering. Prior work on constructing Levin-style semantic verb clusters achieves state-of-the-art results using D-M mixtures for verb sense induction and clustering. We add a bias toward known clusters by explicitly labeling a small number of observations with their correct VerbNet class. We demonstrate that this partial supervision guides the resulting clusters effectively, improving the recovery of both labeled and unlabeled classes by 16%, for a joint 12% absolute improvement in F1 score compared to clustering without supervision. The resulting clusters are also more semantically coherent. Although the technical change is minor, it produces a large effect, with important practical consequences for supervised topic modeling in general.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🧭 Keyword Pioneer — verb class induction
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio