Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Benjamin Warner; Antoine Chaffin; Benjamin Clavié; Orion Weller; Oskar Hallström; Said Taghadouini; Alexis Gallagher; Raja Biswas; Faisal Ladhak; Tom Aarsen; Griffin Thomas Adams; Jeremy Howard; Iacopo Poli

2025 ACL ACL 2025

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Abstract

AbstractEncoder-only transformer models such as BERT offer a great performance-size tradeoff for retrieval and classification tasks with respect to larger decoder-only models. Despite being the workhorse of numerous production pipelines, there have been limited Pareto improvements to BERT since its release. In this paper, we introduce ModernBERT, bringing modern model optimizations to encoder-only models and representing a major Pareto improvement over older encoders. Trained on 2 trillion tokens with a native 8192 sequence length, ModernBERT models exhibit state-of-the-art results on a large pool of evaluations encompassing diverse classification tasks and both single and multi-vector retrieval on different domains (including code). In addition to strong downstream performance, ModernBERT is also the most speed and memory efficient encoder and is designed for inference on common GPUs.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Benjamin Warner , Antoine Chaffin , Benjamin Clavié , Orion Weller , Oskar Hallström , Said Taghadouini , Alexis Gallagher , Raja Biswas , Faisal Ladhak , Tom Aarsen , Griffin Thomas Adams , Jeremy Howard , Iacopo Poli

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Transformers Natural Language Processing > Resources & Methods > Large Language Models Deep Learning > Optimization & Theory > Efficient Computing

Keywords

transformer architecture efficient computing text retrieval memory efficiency long context model efficiency bidirectional encoder encoder model sequence length large language model transformer model

Download PDF

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights 2025

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision 2025

Structural Deep Encoding for Table Question Answering 2025

Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating 2025

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Abstract

Authors

Topics

Keywords

Related papers