Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

Shuo Li; Jiajun Sun; Guodong Zheng; Xiaoran Fan; Yujiong Shen; Yi Lu; Zhiheng Xi; Yuming Yang; Wenming Tan; Tao Ji; Tao Gui; Qi Zhang; Xuanjing Huang

2025 EMNLP EMNLP 2025

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

Abstract

AbstractRecently, multimodal large language models (MLLMs) have demonstrated remarkable performance in visual-language tasks. However, the authenticity of the responses generated by MLLMs is often compromised by object hallucinations. We identify that a key cause of these hallucinations is the model’s over-susceptibility to image frequency features in detecting objects. In this paper, we introduce Multi-Frequency Perturbations (MFP), a simple, cost-effective, and pluggable adversarial training method that leverages both low-frequency and high-frequency features of images to perturb visual feature representations and explicitly suppress redundant frequency-domain features during inference, thereby mitigating hallucinations. Experimental results demonstrate that our method significantly mitigates object hallucinations across various model architectures. Furthermore, as a training-time method, MFP can be combined with inference-time methods to achieve state-of-the-art performance on the CHAIR benchmark.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shuo Li , Jiajun Sun , Guodong Zheng , Xiaoran Fan , Yujiong Shen , Yi Lu , Zhiheng Xi , Yuming Yang , Wenming Tan , Tao Ji , Tao Gui , Qi Zhang , Xuanjing Huang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Learning Types > Adversarial Learning Machine Learning > Optimization & Theory > Neural Network Optimization Computer Vision > Processing > Image Processing Deep Learning > Learning Types > Adversarial Learning Deep Learning > Learning Types > Multimodal Learning

Keywords

adversarial training image processing multimodal large language model frequency domain object hallucination visual feature frequency perturbation

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025