Extending Input Contexts of Language Models through Training on Segmented Sequences

Petros Karypis; Julian McAuley; George Karypis

2024 NAACL NAACL 2024

Extending Input Contexts of Language Models through Training on Segmented Sequences

Abstract

AbstractEffectively training language models on longinputs poses many technical challenges. As acost consideration, languages models are pre-trained on a fixed sequence length before beingadapted to longer sequences. We explore var-ious methods for adapting models to longerinputs by training on segmented sequences andan interpolation-based method for extendingabsolute positional embeddings. We developa training procedure to extend the input con-text size of pretrained models with no architec-tural changes and no additional memory coststhan training on the original input lengths. Bysub-sampling segments from long inputs whilemaintaining their original position the model isable to learn new positional interactions. Ourmethod benefits both models trained with abso-lute positional embeddings, by extending theirinput contexts, as well as popular relative posi-tional embedding methods showing a reducedperplexity on sequences longer than they weretrained on. We demonstrate our method canextend input contexts by a factor of 4× whileimproving perplexity.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — absolute positional embedding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Petros Karypis , Julian McAuley , George Karypis

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Generation > Language Modeling

Keywords

context extension sequence length absolute positional embedding relative positional embedding

Download PDF

Related papers

Working Alliance Transformer for Psychotherapy Dialogue Classification 2024

Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences 2024

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study 2024

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation 2024

Extractive Summarization with Text Generator 2024