2018
EMNLP
EMNLP 2018
Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical Constructions
Abstract
AbstractIn this paper, we focus on parsing rare and non-trivial constructions, in particular ellipsis. We report on several experiments in enrichment of training data for this specific construction, evaluated on five languages: Czech, English, Finnish, Russian and Slovak. These data enrichment methods draw upon self-training and tri-training, combined with a stratified sampling method mimicking the structural complexity of the original treebank. In addition, using these same methods, we also demonstrate small improvements over the CoNLL-17 parsing shared task winning system for four of the five languages, not only restricted to the elliptical constructions.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— data enrichment
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Semi-Supervised Learning
Natural Language Processing > Understanding > Parsing
Machine Learning > Learning Paradigms > Transfer Learning
Machine Learning > Learning Paradigms > Self-Supervised Learning
Machine Learning > Core Methods > Structured Prediction
Natural Language Processing > Applications > Text Processing