Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Yaroslav Aksenov; Nikita Balagansky; Sofia Lo Cicero Vaina; Boris Shaposhnikov; Alexey Gorbatovski; Daniil Gavrilov

2024 ACL ACL 2024

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Abstract

AbstractAdvancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for surpassing Transformer performance on language modeling tasks. However, these models have revealed deficiencies in essential In-Context Learning capabilities – a domain where the Transformer traditionally shines. The Based model emerged as a hybrid solution, blending a Linear Transformer with a kernel inspired by the Taylor expansion of exponential functions, augmented by convolutional networks. Mirroring the Transformer’s in-context adeptness, it became a strong contender in the field. In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task and overall language modeling process, as demonstrated on the Pile dataset.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yaroslav Aksenov , Nikita Balagansky , Sofia Lo Cicero Vaina , Boris Shaposhnikov , Alexey Gorbatovski , Daniil Gavrilov

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Transformers Deep Learning > Learning Types > Representation Learning

Keywords

in-context learning language modeling state space model language model linear transformer kernel function learnable kernel

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024