2024 EMNLP EMNLP 2024

RAG-HAT: A Hallucination-Aware Tuning Pipeline for LLM in Retrieval-Augmented Generation

Abstract

AbstractRetrieval-augmented generation (RAG) has emerged as a significant advancement in the field of large language models (LLMs). By integrating up-to-date information not available during their initial training, RAG greatly enhances the practical utility of LLMs in real-world applications. However, even with RAG, LLMs can still produce inaccurate outputs, such as distorting or misinterpreting source content, posing risks in high-trust scenarios. To address these issues, we introduce a novel approach called Hallucination Aware Tuning (HAT). This method involves training hallucination detection models that generate detection labels and provide detailed descriptions of the detected hallucinations. Utilizing these detection results—particularly the hallucination descriptions—GPT-4 Turbo is employed to correct any detected hallucinations. The corrected outputs, free of hallucinations, along with the original versions, are used to create a preference dataset for Direct Preference Optimization (DPO) training. The fine-tuning through DPO leads to LLMs that exhibit a reduced rate of hallucinations and deliver improved answer quality.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio