GDA: Grammar-based Data Augmentation for Text Classification using Slot Information

Joonghyuk Hahn; Hyunjoon Cheon; Elizabeth Orwig; Su-Hyeon Kim; Sang-Ki Ko; Yo-Sub Han

2023 EMNLP EMNLP 2023

GDA: Grammar-based Data Augmentation for Text Classification using Slot Information

Abstract

AbstractRecent studies propose various data augmentation approaches to resolve the low-resource problem in natural language processing tasks. Data augmentation is a successful solution to this problem and recent strategies give variation on sentence structures to boost performance. However, these approaches can potentially lead to semantic errors and produce semantically noisy data due to the unregulated variation of sentence structures. In an effort to combat these semantic errors, we leverage slot information, the representation of the context of keywords from a sentence, and form a data augmentation strategy which we propose, called GDA. Our strategy employs algorithms that construct and manipulate rules of context-aware grammar, utilizing this slot information. The algorithms extract recurrent patterns by distinguishing words with slots and form the “rules of grammar”—a set of injective relations between a sentence’s semantics and its syntactical structure—to augment the dataset. The augmentation is done in an automated manner with the constructed rules and thus, GDA is explainable and reliable without any human intervention. We evaluate GDA with state-of-the-art data augmentation techniques, including those using pre-trained language models, and the result illustrates that GDA outperforms all other data augmentation methods by 19.38%. Extensive experiments show that GDA is an effective data augmentation strategy that incorporates word semantics for more accurate and diverse data.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — grammar-based augmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Joonghyuk Hahn , Hyunjoon Cheon , Elizabeth Orwig , Su-Hyeon Kim , Sang-Ki Ko , Yo-Sub Han

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Data Augmentation Deep Learning > Learning Types > Data Augmentation

Keywords

text classification data augmentation semantic preservation slot information grammar-based augmentation semantic variation

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023