Revisiting Pretraining with Adapters

Seungwon Kim; Alex Shum; Nathan Susanj; Jonathan Hilgart

2021 ACL ACL 2021

Revisiting Pretraining with Adapters

Abstract

AbstractPretrained language models have served as the backbone for many state-of-the-art NLP results. These models are large and expensive to train. Recent work suggests that continued pretraining on task-specific data is worth the effort as pretraining leads to improved performance on downstream tasks. We explore alternatives to full-scale task-specific pretraining of language models through the use of adapter modules, a parameter-efficient approach to transfer learning. We find that adapter-based pretraining is able to achieve comparable results to task-specific pretraining while using a fraction of the overall trainable parameters. We further explore direct use of adapters without pretraining and find that the direct fine-tuning performs mostly on par with pretrained adapter models, contradicting previously proposed benefits of continual pretraining in full pretraining fine-tuning strategies. Lastly, we perform an ablation study on task-adaptive pretraining to investigate how different hyperparameter settings can change the effectiveness of the pretraining.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — continual pretraining

🐣 Hot Topic Early Bird — continual pretraining

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Seungwon Kim , Alex Shum , Nathan Susanj , Jonathan Hilgart

Topics

Artificial Intelligence > Core AI > Model Compression Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Knowledge Distillation Machine Learning > Application Areas > Model Merging Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Transfer Learning

Keywords

model compression transfer learning knowledge distillation parameter-efficient learning parameter-efficient transfer learning continued pretraining downstream task language model fine-tuning continual pretraining adapter module fine-tuning strategy parameter-efficient transfer task-specific pretraining task-adaptive pretraining

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021