Layer Duplication in LLMs

Neo Eyal; Nachum Dershowitz; Kfir Bar

2025 EMNLP EMNLP 2025

Layer Duplication in LLMs

Abstract

AbstractWe investigate the effect of duplicating multihead self-attention layers in large language models (LLMs) across a range of language tasks, with and without fine-tuning. The results demonstrate that duplicating the initial layers once or twice often yields a significant performance boost. Attention analysis uncovered the underlying mechanisms driving the improvement when performing layer duplication. This method enhances LLM capabilities with or without additional training or labeled data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Neo Eyal , Nachum Dershowitz , Kfir Bar

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Transformers Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models

Keywords

transformer architecture attention mechanism parameter efficient model scaling parameter efficiency multi-head attention large language model neural network layer duplication

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025