Deloitte (Drocks) at SemEval-2025 Task 3: Fine-Grained Multi-lingual Hallucination Detection Using Internal LLM Weights

Alex Chandler; Harika Abburi; Sanmitra Bhattacharya; Edward Bowen; Nirmala Pudota

2025 SEMEVAL SemEval 2025

Deloitte (Drocks) at SemEval-2025 Task 3: Fine-Grained Multi-lingual Hallucination Detection Using Internal LLM Weights

Abstract

AbstractLarge Language Models (LLMs) have greatly advanced the field of Natural Language Generation (NLG). Despite their remarkable capabilities, their tendency to hallucinate—producing inaccurate or misleading information-remains a barrier to wider adoption. Current hallucination detection methods mainly employ coarse-grained binary classification at the sentence or document level, overlooking the need for precise identification of the specific text spans containing hallucinations. In this paper, we proposed a methodology that generates supplementary context and processes text using an LLM to extract internal weights (features) from various layers. These extracted features serve as input for a neural network classifier designed to perform token-level binary detection of hallucinations. Subsequently, we map the resulting token-level predictions to character-level predictions, enabling the identification of spans of hallucinated text, which we refer to as hallucination spans. Our model achieved a top-ten ranking in 13 of the 14 languages and secured first place for the French language in the SemEval: Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes (Mu-SHROOM), utilizing the Mu-SHROOM dataset provided by the task organizers.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio