HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain

Spandan Anaokar; Shrey Ganatra; Harshvivek Kashid; Swapnil Bhattacharyya; Shruthi Nair; Reshma Sekhar; Siddharth Manohar; Rahul Hemrajani; Pushpak Bhattacharyya

2025 EMNLP EMNLP 2025

HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain

Abstract

AbstractLarge Language Models (LLMs) are widely used in industry but remain prone to hallucinations, limiting their reliability in critical applications. This work addresses hallucination reduction in consumer grievance chatbots built using LLaMA 3.1 8B Instruct, a compact model frequently used in industry. We develop **HalluDetect**, an LLM-based hallucination detection system that achieves an F1 score of **68.92%** outperforming baseline detectors by **22.47%**. Benchmarking five hallucination mitigation architectures, we find that out of them, AgentBot minimizes hallucinations to **0.4159** per turn while maintaining the highest token accuracy (**96.13%**), making it the most effective mitigation strategy. Our findings provide a scalable framework for hallucination mitigation, demonstrating that optimized inference strategies can significantly improve factual accuracy.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Healthcare & Medicine and Natural Language Processing

🧭 Keyword Pioneer — token accuracy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Spandan Anaokar , Shrey Ganatra , Harshvivek Kashid , Swapnil Bhattacharyya , Shruthi Nair , Reshma Sekhar , Siddharth Manohar , Rahul Hemrajani , Pushpak Bhattacharyya

Topics

Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Responsible AI Natural Language Processing > Generation > Dialogue Systems Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Dialogue Systems Healthcare & Medicine > Clinical > Medical AI Artificial Intelligence > Core AI > Safety Artificial Intelligence > Core AI > Dialogue Systems

Keywords

factual accuracy hallucination mitigation legal domain hallucination detection conversational agent conversational system mitigation strategy large language model token accuracy

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025