MDPO: Customized Direct Preference Optimization with a Metric-based Sampler for Question and Answer Generation

Yihang Wang; Bowen Tian; Yueyang Su; Yixing Fan; Jiafeng Guo

2025 COLING COLING 2025

MDPO: Customized Direct Preference Optimization with a Metric-based Sampler for Question and Answer Generation

Abstract

AbstractWith the extensive use of large language models, automatically generating QA datasets for domain-specific fine-tuning has become crucial. However, considering the multifaceted demands for readability, diversity, and comprehensiveness of QA data, current methodologies fall short in producing high-quality QA datasets. Moreover, the dependence of existing evaluation metrics on ground truth labels further exacerbates the challenges associated with the selection of QA data. In this paper, we introduce a novel method for QA data generation, denoted as MDPO. We proposes a set of unsupervised evaluation metrics for QA data, enabling multidimensional assessment based on the relationships among context,question and answer. Furthermore, leveraging these metrics, we implement a customized direct preference optimization process that guides large language models to produce high-quality and domain-specific QA pairs. Empirical results on public datasets indicate that MDPO’s performance substantially surpasses that of state-of-the-art methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — metric-based sampler

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Yihang Wang , Bowen Tian , Yueyang Su , Yixing Fan , Jiafeng Guo

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Applications > Question Answering

Keywords

direct preference optimization domain-specific fine-tuning question answer generation unsupervised evaluation metric metric-based sampler

Download PDF

Related papers

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection 2025

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution 2025

Positive Text Reframing under Multi-strategy Optimization 2025

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration 2025

Two-stage Incomplete Utterance Rewriting on Editing Operation 2025