Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models

Weiyi Wu; Xinwen Xu; Chongyang Gao; Xingjian Diao; Siting Li; Lucas A. Salas; Jiang Gui

2025 EMNLP EMNLP 2025

Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models

Abstract

AbstractLarge Language Models (LLMs) offer transformative potential across diverse fields, yet their safe and effective deployment is hindered by inherent knowledge conflicts—stemming from temporal evolution, divergent sources, and contradictory guidelines. This challenge is particularly acute in medicine, an interdisciplinary frontier for NLP. Rapid medical concept drift can lead LLMs to provide incorrect or outdated advice, impacting their utility and the broader societal benefits of NLP advances. This study introduces ConflictMedQA, a benchmark designed to systematically evaluate how LLMs manage varied knowledge conflicts in clinical guidelines. Our assessment of seven state-of-the-art models across 4,290 scenarios reveals significant difficulties in rejecting incorrect recommendations and frequent endorsement of conflicting advice, highlighting an important gap for NLP systems intended for real-world impact. We explore two fundamental mitigation approaches: retrieval-augmented generation and preference fine-tuning via direct preference optimization. While each offers improvements, their synergistic combination yields the best results. These findings emphasize the need for LLMs to discern subtle but critical guideline conflicts. This is a crucial step in advancing NLP’s capabilities and ensuring its dependable application in critical societal domains. The proposed dataset is available at https://huggingface.co/datasets/RDBH/DriftMed.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Healthcare & Medicine and Natural Language Processing

🧭 Keyword Pioneer — knowledge drift

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Weiyi Wu , Xinwen Xu , Chongyang Gao , Xingjian Diao , Siting Li , Lucas A. Salas , Jiang Gui

Topics

Artificial Intelligence > Core AI > Foundation Models Natural Language Processing > Resources & Methods > Large Language Models Deep Learning > Learning Types > Retrieval-Augmented Generation Healthcare & Medicine > Clinical > Medical NLP Deep Learning > Learning Types > Reinforcement Learning from Human Feedback

Keywords

direct preference optimization medical nlp retrieval-augmented generation medical domain medical knowledge knowledge conflict preference fine-tuning large language model knowledge drift

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025