Uncovering Hidden Consequences of Pre-training Objectives in Sequence-to-Sequence Models

Tannon Kew; Rico Sennrich

2023 ACL ACL 2023

Uncovering Hidden Consequences of Pre-training Objectives in Sequence-to-Sequence Models

Abstract

AbstractSome variants of self-supervised denoising objectives for pre-training encoder-decoder language models have been reported to have a negligible impact on downstream performance. Yet the design of these pre-training objectives leads to behavioural differences that can be uncovered with specific manipulations. We reproduce a recently proposed zero-shot control method and find that it is only successful on a subset of models. To understand what causes the difference in its effectiveness, we perform a set of controlled experiments, varying only the pre-training objective, and find unexpected interactions between the pre-training method and downstream controllability of models after fine-tuning. Our results show that different pre-training objectives have consequences that may not be visible in standard downstream evaluation, but which should be taken into account when developing models with controllability in mind.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — denoising objective

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio