Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis

Shuhaib Mehri; Xiusi Chen; Heng Ji; Dilek Hakkani-Tur

2026 EACL EACL 2026

Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis

Abstract

AbstractHigh-quality instruction-tuning data is crucial for developing Large Language Models (LLMs) that can effectively navigate real-world tasks and follow human instructions. While synthetic data generation offers a scalable approach for creating such datasets, it imposes a quality ceiling where models trained on the data cannot outperform the LLM generating it. To overcome this limitation, we introduce Reference-Level Feedback, a paradigm that extracts desirable characteristics from carefully curated reference samples to guide the synthesis of higher-quality instruction-response pairs. Using this approach, we synthesize REFED, a dataset of 10K instruction-response pairs. Fine-tuning Llama-3.1-8B-Instruct and Mistral-7B-Instruct on REFED demonstrate state-of-the-art performance among similarly sized models, notably reaching a 43.96% length-controlled win-rate on AlpacaEval 2.0. Extensive experiments demonstrate that Reference-Level Feedback consistently outperforms traditional sample-level feedback methods, generalizes across model architectures, and produces high-quality and diverse data at low cost.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — reference-level feedback

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shuhaib Mehri , Xiusi Chen , Heng Ji , Dilek Hakkani-Tur

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Resources & Methods > Large Language Models

Keywords

synthetic data generation data synthesis language model fine-tuning instruction-tuning datum reference-level feedback

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026