Benchmarking Direct Preference Optimization for Medical Large Vision–Language Models

Dain Kim; Jiwoo Lee; Jaehoon Yun; Yong Hoe Koo; Qingyu Chen; Hyunjae Kim; Jaewoo Kang

2026 EACL EACL 2026

Benchmarking Direct Preference Optimization for Medical Large Vision–Language Models

Abstract

AbstractLarge vision-language models (LVLMs) are gaining traction in clinical tasks such as diagnostic support, report generation, and medical question answering. Among post-training techniques, Direct Preference Optimization (DPO) has shown promise in aligning model outputs with human preferences, yet its effectiveness in high-stakes medical contexts remains underexplored. In this work, we present the first systematic evaluation of nine DPO variants applied to two leading medical LVLMs, LLaVA-Med and HuatuoGPT-Vision. We benchmark these models on five curated datasets covering diverse clinical tasks. Evaluations include both automated metrics and expert assessments. Our results show that while DPO improves alignment and reduces severe hallucinations, it yields inconsistent gains over supervised fine-tuning. We further introduce DPO variant that better handles visual misinterpretations and enhances clinical understanding. These findings reveal both the potential and limitations of DPO in medical AI. To support future research, we will release all DPO training data, model checkpoints, and expert annotations upon acceptance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Dain Kim , Jiwoo Lee , Jaehoon Yun , Yong Hoe Koo , Qingyu Chen , Hyunjae Kim , Jaewoo Kang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Model Merging

Keywords

direct preference optimization human preference alignment clinical task medical vision-language model

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026