DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers

Xueguang Ma; Xi Victoria Lin; Barlas Oguz; Jimmy Lin; Wen-tau Yih; Xilun Chen

2025 ACL ACL 2025

DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers

Abstract

AbstractLarge language models (LLMs) have demonstrated strong effectiveness and robustness when fine-tuned as dense retrievers.However, their large parameter size presents significant computational challenges at inference time.While smaller retrievers offer better efficiency, they often fail to generalize effectively with limited supervised fine-tuning data.In this work, we introduce DRAMA, a training framework that leverages LLMs to train smaller generalizable dense retrievers.In particular, we adopt pruned LLMs as the backbone and train on diverse LLM-augmented data in a single-stage contrastive learning setup.Experiments show that DRAMA offers better multilingual and long-context capabilities than traditional encoder-based retrievers, and achieves strong performance across multiple tasks and languages.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — dense retriever

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio