A Challenge Set and Methods for Noun-Verb Ambiguity

Ali Elkahky; Kellie Webster; Daniel Andor; Emily Pitler

2018 EMNLP EMNLP 2018

A Challenge Set and Methods for Noun-Verb Ambiguity

Abstract

AbstractEnglish part-of-speech taggers regularly make egregious errors related to noun-verb ambiguity, despite having achieved 97%+ accuracy on the WSJ Penn Treebank since 2002. These mistakes have been difficult to quantify and make taggers less useful to downstream tasks such as translation and text-to-speech synthesis. This paper creates a new dataset of over 30,000 naturally-occurring non-trivial examples of noun-verb ambiguity. Taggers within 1% of each other when measured on the WSJ have accuracies ranging from 57% to 75% accuracy on this challenge set. Enhancing the strongest existing tagger with contextual word embeddings and targeted training data improves its accuracy to 89%, a 14% absolute (52% relative) improvement. Downstream, using just this enhanced tagger yields a 28% reduction in error over the prior best learned model for homograph disambiguation for textto-speech synthesis.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — homograph disambiguation

🐣 Hot Topic Early Bird — word sense disambiguation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Ali Elkahky , Kellie Webster , Daniel Andor , Emily Pitler

Topics

Machine Learning > Core Methods > Classification Natural Language Processing > Understanding > Part-of-Speech Tagging Machine Learning > Learning Types > Representation Learning Natural Language Processing > Resources & Methods > Language Modeling Deep Learning > Learning Types > Representation Learning Natural Language Processing > Applications > Natural Language Processing

Keywords

word sense disambiguation part-of-speech tagging text-to-speech synthesis homograph disambiguation contextual word embedding challenge set noun-verb ambiguity

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018