Muppet: Massive Multi-task Representations with Pre-Finetuning

Armen Aghajanyan; Anchit Gupta; Akshat Shrivastava; Xilun Chen; Luke Zettlemoyer; Sonal Gupta

2021 EMNLP EMNLP 2021

Muppet: Massive Multi-task Representations with Pre-Finetuning

Abstract

AbstractWe propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g. RoBERTa) and generation models (e.g. BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Armen Aghajanyan , Anchit Gupta , Akshat Shrivastava , Xilun Chen , Luke Zettlemoyer , Sonal Gupta

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Core Methods > Representation Learning Natural Language Processing > Resources & Methods > Transfer Learning Machine Learning > Learning Paradigms > Multi-Task Learning Deep Learning > Learning Types > Transfer Learning

Keywords

representation learning sample efficiency multi-task learning transfer learning language model commonsense reasoning transformer model

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021