SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models

Sudong Wang; Yunjian Zhang; Yao Zhu; Enci Liu; Jianing Li; Yanwei Liu; Xiangyang Ji

2025 ICCV ICCV 2025

SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models

Abstract

Large Language Models (LLMs) are prone to hallucinations, which pose significant risks in their applications. Most existing hallucination detection methods rely on internal probabilities or external knowledge, and they are limited to identifying hallucinations at the sentence or passage level. In this paper, we introduce the first token-level, zero-resource hallucination detection framework, leveraging a novel approach inspired by the Mad Libs game. This method assesses the reliability of the input text by evaluating the consistency of information before and after the game. Building on this framework, we also propose an innovative automated hallucination generation technique and introduce a high-quality hallucination dataset, HalluWiki. Extensive experiments demonstrate that our approach achieves over 90% detection accuracy across different levels, establishing a new frontier in hallucination detection for LLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sudong Wang , Yunjian Zhang , Yao Zhu , Enci Liu , Jianing Li , Yanwei Liu , Xiangyang Ji

Topics

Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Text Generation

Keywords

text generation multimodal learning multimodal large language model hallucination detection information flow token-level prediction token-level detection large language model

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025