Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models

Raymond Li; Gabriel Murray; Giuseppe Carenini

2023 EMNLP EMNLP 2023

Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models

Abstract

AbstractIn this work, we propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models in the parameter-efficient fine-tuning (PEFT) setting. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture, where Gumbel-Softmax gates are used to determine the importance of these modules at each layer of the model. To reduce the number of parameters, we first train the model for a fixed small number of steps before pruning the experts based on their important scores. Our experiment results with three different pre-trained models show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters. In addition, we provide additional analysis to examine the experts selected by each model at each layer to provide insights for future studies.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Parameter-Efficient Fine-Tuning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Raymond Li , Gabriel Murray , Giuseppe Carenini

Topics

Deep Learning > Architectures > Transformers Deep Learning > Techniques > Model Architecture Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Application Areas > Model Compression Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Knowledge Distillation Artificial Intelligence > Core AI > Knowledge Distillation Deep Learning > Learning Types > Parameter-Efficient Fine-Tuning

Keywords

parameter-efficient fine-tuning mixture of expert pre-trained language model linguistic structure adapter module

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023