When does Further Pre-training MLM Help? An Empirical Study on Task-Oriented Dialog Pre-training

Qi Zhu; Yuxian Gu; Lingxiao Luo; Bing Li; Cheng Li; Wei Peng; Minlie Huang; Xiaoyan Zhu

2021 EMNLP EMNLP 2021

When does Further Pre-training MLM Help? An Empirical Study on Task-Oriented Dialog Pre-training

Abstract

AbstractFurther pre-training language models on in-domain data (domain-adaptive pre-training, DAPT) or task-relevant data (task-adaptive pre-training, TAPT) before fine-tuning has been shown to improve downstream tasks’ performances. However, in task-oriented dialog modeling, we observe that further pre-training MLM does not always boost the performance on a downstream task. We find that DAPT is beneficial in the low-resource setting, but as the fine-tuning data size grows, DAPT becomes less beneficial or even useless, and scaling the size of DAPT data does not help. Through Representational Similarity Analysis, we conclude that more data for fine-tuning yields greater change of the model’s representations and thus reduces the influence of initialization.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qi Zhu , Yuxian Gu , Lingxiao Luo , Bing Li , Cheng Li , Wei Peng , Minlie Huang , Xiaoyan Zhu

Topics

Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Generation > Language Modeling Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Transfer Learning Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Dialogue Systems Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Transfer Learning

Keywords

representation learning masked language model masked language modeling representational similarity analysis task-oriented dialog domain-adaptive pre-training task-adaptive pre-training

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021