DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Bosheng Ding; Linlin Liu; Lidong Bing; Canasai Kruengkrai; Thien Hai Nguyen; Shafiq Joty; Luo Si; Chunyan Miao

2020 EMNLP EMNLP 2020

DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Abstract

AbstractData augmentation techniques have been widely used to improve machine learning performance as they facilitate generalization. In this work, we propose a novel augmentation method to generate high quality synthetic data for low-resource tagging tasks with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings. For the supervised settings, we conduct extensive experiments on named entity recognition (NER), part of speech (POS) tagging and end-to-end target based sentiment analysis (E2E-TBSA) tasks. For the semi-supervised settings, we evaluate our method on the NER task under the conditions of given unlabeled data only and unlabeled data plus a knowledge base. The results show that our method can consistently outperform the baselines, particularly when the given gold training data are less.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — low-resource learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Bosheng Ding , Linlin Liu , Lidong Bing , Canasai Kruengkrai , Thien Hai Nguyen , Shafiq Joty , Luo Si , Chunyan Miao

Topics

Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Understanding > Named Entity Recognition Natural Language Processing > Understanding > Part-of-Speech Tagging

Keywords

data augmentation named entity recognition part-of-speech tagging low-resource learning language model

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020