TRAMS: Training-free Memory Selection for Long-range Language Modeling

Haofei Yu; Cunxiang Wang; Yue Zhang; Wei Bi

2023 EMNLP EMNLP 2023

TRAMS: Training-free Memory Selection for Long-range Language Modeling

Abstract

AbstractThe Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range language modeling. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens participating in attention calculation based on one simple metric. This strategy allows us to keep tokens that are likely to have a high attention score with the current queries and ignore the other ones. We have tested our approach on the word-level benchmark (WikiText-103) and the character-level benchmark (enwik8), and the results indicate an improvement without having additional training or adding additional parameters.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — memory selection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haofei Yu , Cunxiang Wang , Yue Zhang , Wei Bi

Topics

Artificial Intelligence > Core AI > Memory Deep Learning > Architectures > Transformers Machine Learning > Learning Types > Supervised Learning Machine Learning > Learning Types > Deep Learning Deep Learning > Optimization & Theory > Neural Network Optimization Artificial Intelligence > Core AI > Language Deep Learning > Optimization & Theory > Efficient Computing Deep Learning > Techniques > Attention Deep Learning > Models > Language Models

Keywords

transformer architecture attention mechanism token selection language modeling long-range dependency memory selection

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023