LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models

Hieu Tran; Junda Wang; Yujan Ting; Hong Yu; Weijing Huang; Terrence Chen

2025 EMNLP EMNLP 2025

LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models

Abstract

AbstractLarge language models (LLMs) often struggle with factual accuracy in knowledge-intensive domains like healthcare. We introduce LEAF (Learning and Evaluation Augmented by Fact-Checking), a framework for improving LLM factuality in medical question answering. LEAF comprises three components: (1) RAFE, a robust fact-checking system using open-source LLMs and domain-specific retrieval to evaluate response accuracy; (2) Fact-Check-then-RAG, which leverages fact-checking results to guide retrieval without parameter updates; and (3) Learning from Fact Check, enabling self-training through supervised fine-tuning or preference-based learning using fact-checking as pseudo-labels. Experimental results show that RAFE outperforms Factcheck-GPT in detecting inaccuracies, Fact-Check-then-RAG effectively corrects errors, and Learning from Fact Check improves performance without labeled data. In a real-world healthcare deployment with proprietary medical documents, LEAF achieved an 83% improvement in factuality scores, demonstrating practical applicability for adapting general-purpose LLMs to organization-specific knowledge. Our framework provides a scalable solution for industrial applications requiring high factual accuracy.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Healthcare & Medicine and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hieu Tran , Junda Wang , Yujan Ting , Hong Yu , Weijing Huang , Terrence Chen

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Generation > Language Modeling Natural Language Processing > Applications > Fact-Checking Natural Language Processing > Applications > Question Answering Healthcare & Medicine > Clinical > Clinical NLP Artificial Intelligence > Core AI > Large Language Models Deep Learning > Models > Large Language Models Healthcare & Medicine > Clinical > Medical AI Machine Learning > Learning Types > Retrieval-Augmented Generation

Keywords

factual accuracy supervised fine-tuning preference-based learning retrieval-augmented generation medical question answering large language model

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025