Post-Training with Interrogative Sentences for Enhancing BART-based Korean Question Generator

Gyu-Min Park; Seong-Eun Hong; Seong-Bae Park

2022 AACL AACL 2022

Post-Training with Interrogative Sentences for Enhancing BART-based Korean Question Generator

Abstract

AbstractThe pre-trained language models such as KoBART often fail in generating perfect interrogative sentences when they are applied to Korean question generation. This is mainly due to the fact that the language models are much experienced with declarative sentences, but not with interrogative sentences. Therefore, this paper proposes a novel post-training of KoBART to enhance it for Korean question generation. The enhancement of KoBART is accomplished in three ways: (i) introduction of question infilling objective to KoBART to enforce it to focus more on the structure of interrogative sentences, (ii) augmentation of training data for question generation with another data set to cope with the lack of training instances for post-training, (iii) introduction of Korean spacing objective to make KoBART understand the linguistic features of Korean. Since there is no standard data set for Korean question generation, this paper also proposes KorQuAD-QG, a new data set for this task, to verify the performance of the proposed post-training. Our code are publicly available at https://github.com/gminipark/post_training_qg

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — interrogative sentence

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Gyu-Min Park , Seong-Eun Hong , Seong-Bae Park

Topics

Machine Learning > Application Areas > Knowledge Distillation Natural Language Processing > Generation > Language Modeling Natural Language Processing > Generation > Text Generation Deep Learning > Techniques > Transfer Learning

Keywords

knowledge distillation text generation question generation pre-trained language model korean language interrogative sentence

Download PDF

Related papers

A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging 2022

Enhancing Tabular Reasoning with Pattern Exploiting Training 2022

Re-contextualizing Fairness in NLP: The Case of India 2022

Adversarially Improving NMT Robustness to ASR Errors with Confusion Sets 2022

Promoting Pre-trained LM with Linguistic Features on Automatic Readability Assessment 2022