Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs

Dawid Jan Kopiczko; Tijmen Blankevoort; Yuki M Asano

2025 EMNLP EMNLP 2025

Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs

Abstract

AbstractDecoder-only large language models typically rely solely on masked causal attention, which limits their expressiveness by restricting information flow to one direction. We propose Bitune, a method that enhances pretrained decoder-only LLMs by incorporating bidirectional attention into prompt processing. We evaluate Bitune in instruction-tuning and question-answering settings, showing significant improvements in performance on commonsense reasoning, arithmetic, and language understanding tasks. Furthermore, extensive ablation studies validate the role of each component of the method, and demonstrate that Bitune is compatible with various parameter-efficient finetuning techniques and full model finetuning.

🧭 Keyword Pioneer — decoder-only models

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

Authors

Dawid Jan Kopiczko , Tijmen Blankevoort , Yuki M Asano

Topics

Deep Learning > Architectures > Transformers Deep Learning > Techniques > Model Architecture Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Fine-Tuning

Keywords

attention mechanism bidirectional attention instruction tuning parameter-efficient fine-tuning parameter-efficient finetuning decoder-only model decoder-only architecture large language model

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025