Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation

Pei Ke; Haozhe Ji; Zhenyu Yang; Yi Huang; Junlan Feng; Xiaoyan Zhu; Minlie Huang

2022 IJCAI IJCAI 2022

Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation

Abstract

Despite the success of text-to-text pre-trained models in various natural language generation (NLG) tasks, the generation performance is largely restricted by the number of labeled data in downstream tasks, particularly in data-to-text generation tasks. Existing works mostly utilize abundant unlabeled structured data to conduct unsupervised pre-training for task adaption, which fail to model the complex relationship between source structured data and target texts. Thus, we introduce self-training as a better few-shot learner than task-adaptive pre-training, which explicitly captures this relationship via pseudo-labeled data generated by the pre-trained model. To alleviate the side-effect of low-quality pseudo-labeled data during self-training, we propose a novel method called Curriculum-Based Self-Training (CBST) to effectively leverage unlabeled data in a rearranged order determined by the difficulty of text generation. Experimental results show that our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — pseudo-labeled datum

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Pei Ke , Haozhe Ji , Zhenyu Yang , Yi Huang , Junlan Feng , Xiaoyan Zhu , Minlie Huang

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Generation > Text Generation

Keywords

few-shot learning curriculum learning data-to-text generation pseudo-labeled datum

Download PDF

Related papers

Better Collective Decisions via Uncertainty Reduction 2022

Mixed Strategies for Security Games with General Defending Requirements 2022

Achieving Envy-Freeness with Limited Subsidies under Dichotomous Valuations 2022

Distortion in Voting with Top-t Preferences 2022

Let’s Agree to Agree: Targeting Consensus for Incomplete Preferences through Majority Dynamics 2022