SMASH: Improving SMAll Language Models’ Few-SHot Ability with Prompt-Based Distillation

Yueqian Wang; Chang Liu; Kai Chen; Xi Wang; Dongyan Zhao

2022 EMNLP EMNLP 2022

SMASH: Improving SMAll Language Models’ Few-SHot Ability with Prompt-Based Distillation

Abstract

AbstractLarge-scale language models coupled with prompts have shown remarkable performance on few-shot learning. However, through systematic experiments, we find that the few-shot performance of small language models is poor, and using prompts on them brings fewer improvements than on larger ones. In this paper, we propose SMASH, an approach to improve SMAll language models’ few-SHot ability by training on intermediate tasks before prompt-based fine-tuning on downstream tasks. We design intermediate tasks for sentence-pair tasks and sentiment classification tasks by creating training examples with prompt templates similar to downstream tasks using sentences sampled from a large-scale unsupervised corpus, and apply knowledge distillation to distill from outputs of larger pre-trained models as the training objective. We conduct extensive experiments and show that SMASH can make a 6-layer DistilRoBRETa-base achieve comparable performance on few-shot datasets with a 12-layer RoBERTa-base at a low cost.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — small language model

🐣 Hot Topic Early Bird — small language model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio