Unsupervised Labeled Parsing with Deep Inside-Outside Recursive Autoencoders

Andrew Drozdov; Patrick Verga; Yi-Pei Chen; Mohit Iyyer; Andrew McCallum

2019 IJCNLP IJCNLP 2019

Unsupervised Labeled Parsing with Deep Inside-Outside Recursive Autoencoders

Abstract

AbstractUnderstanding text often requires identifying meaningful constituent spans such as noun phrases and verb phrases. In this work, we show that we can effectively recover these types of labels using the learned phrase vectors from deep inside-outside recursive autoencoders (DIORA). Specifically, we cluster span representations to induce span labels. Additionally, we improve the model’s labeling accuracy by integrating latent code learning into the training procedure. We evaluate this approach empirically through unsupervised labeled constituency parsing. Our method outperforms ELMo and BERT on two versions of the Wall Street Journal (WSJ) dataset and is competitive to prior work that requires additional human annotations, improving over a previous state-of-the-art system that depends on ground-truth part-of-speech tags by 5 absolute F1 points (19% relative error reduction).

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — label clustering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Andrew Drozdov , Patrick Verga , Yi-Pei Chen , Mohit Iyyer , Andrew McCallum

Topics

Machine Learning > Learning Types > Unsupervised Learning Deep Learning > Architectures > Autoencoders Natural Language Processing > Understanding > Parsing

Keywords

unsupervised parsing constituency parsing label clustering span representation recursive autoencoder phrase vector deep inside-outside recursive autoencoder

Download PDF

Related papers

Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation 2019

Exploiting Monolingual Data at Scale for Neural Machine Translation 2019

Distributionally Robust Language Modeling 2019

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling 2019

ARAML: A Stable Adversarial Training Framework for Text Generation 2019