Synthetic Data Made to Order: The Case of Parsing

Dingquan Wang; Jason Eisner

2018 EMNLP EMNLP 2018

Synthetic Data Made to Order: The Case of Parsing

Abstract

AbstractTo approximately parse an unfamiliar language, it helps to have a treebank of a similar language. But what if the closest available treebank still has the wrong word order? We show how to (stochastically) permute the constituents of an existing dependency treebank so that its surface part-of-speech statistics approximately match those of the target language. The parameters of the permutation model can be evaluated for quality by dynamic programming and tuned by gradient descent (up to a local optimum). This optimization procedure yields trees for a new artificial language that resembles the target language. We show that delexicalized parsers for the target language can be successfully trained using such “made to order” artificial languages.

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning and Natural Language Processing

📈 Trend Setter — Syntax

🧭 Keyword Pioneer — treebank adaptation

🐣 Hot Topic Early Bird — synthetic data generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Dingquan Wang , Jason Eisner

Topics

Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Understanding > Parsing Interdisciplinary > Linguistics > Computational Linguistics Machine Learning > Learning Paradigms > Transfer Learning Interdisciplinary > Linguistics > Syntax

Keywords

transfer learning domain adaptation data augmentation cross-lingual transfer dependency parsing synthetic data generation permutation models constituent parsing treebank adaptation

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018