Does the Order of Training Samples Matter? Improving Neural Data-to-Text Generation with Curriculum Learning

Ernie Chang; Hui-Syuan Yeh; Vera Demberg

2021 EACL EACL 2021

Does the Order of Training Samples Matter? Improving Neural Data-to-Text Generation with Curriculum Learning

Abstract

AbstractRecent advancements in data-to-text generation largely take on the form of neural end-to-end systems. Efforts have been dedicated to improving text generation systems by changing the order of training samples in a process known as curriculum learning. Past research on sequence-to-sequence learning showed that curriculum learning helps to improve both the performance and convergence speed. In this work, we delve into the same idea surrounding the training samples consisting of structured data and text pairs, where at each update, the curriculum framework selects training samples based on the model’s competence. Specifically, we experiment with various difficulty metrics and put forward a soft edit distance metric for ranking training samples. On our benchmarks, it shows faster convergence speed where training time is reduced by 38.7% and performance is boosted by 4.84 BLEU.

❓ The Questioner

🧭 Keyword Pioneer — soft edit distance

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ernie Chang , Hui-Syuan Yeh , Vera Demberg

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Learning Types > Curriculum Learning

Keywords

curriculum learning data-to-text generation neural network soft edit distance

Download PDF

Related papers

Joint Coreference Resolution and Character Linking for Multiparty Conversation 2021

Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering 2021

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO 2021

Representations for Question Answering from Documents with Tables and Text 2021

Gender and Racial Fairness in Depression Research using Social Media 2021