What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models

Mor Geva; Uri Katz; Aviv Ben-Arie; Jonathan Berant

2021 EMNLP EMNLP 2021

What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models

Abstract

AbstractThe primary paradigm for multi-task training in natural language processing is to represent the input with a shared pre-trained language model, and add a small, thin network (head) per task. Given an input, a target head is the head that is selected for outputting the final prediction. In this work, we examine the behaviour of non-target heads, that is, the output of heads when given input that belongs to a different task than the one they were trained for. We find that non-target heads exhibit emergent behaviour, which may either explain the target task, or generalize beyond their original task. For example, in a numerical reasoning task, a span extraction head extracts from the input the arguments to a computation that results in a number generated by a target generative head. In addition, a summarization head that is trained with a target question answering head, outputs query-based summaries when given a question and a context from which the answer is to be extracted. This emergent behaviour suggests that multi-task training leads to non-trivial extrapolation of skills, which can be harnessed for interpretability and generalization.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — task head

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mor Geva , Uri Katz , Aviv Ben-Arie , Jonathan Berant

Topics

Artificial Intelligence > Core AI > Interpretability Deep Learning > Architectures > Transformers Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Learning Paradigms > Multi-Task Learning Deep Learning > Learning Types > Multi-Task Learning

Keywords

transfer learning pretrained language model multi-task training emergent behavior transformer model task head skill extrapolation emergent behaviour

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021