Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

Zefeng Zhang; Hengzhu Tang; Jiawei Sheng; Zhenyu Zhang; Yiming Ren; Zhenyang Li; Dawei Yin; Duohe Ma; Tingwen Liu

2025 CVPR CVPR 2025

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

Abstract

Multimodal Large Language Models (MLLMs) excel in various tasks, yet often struggle with modality bias, tending to rely heavily on a single modality or prior knowledge when generating responses. In this paper, we propose a debiased preference optimization dataset, RLAIF-V-Bias, and introduce a Noise-Aware Preference Optimization (NAPO) algorithm. Specifically, we first construct the dataset by introducing perturbations to reduce the informational content of certain modalities, prompting the model to overly rely on a specific modality when generating responses. To address the inevitable noise in automatically constructed data, we combine the noise-robust Mean Absolute Error (MAE) with the Binary Cross-Entropy (BCE) in Direct Preference Optimization (DPO) using a negative Box-Cox transformation and dynamically adjust the algorithm's noise robustness based on the evaluated noise levels in the data.Extensive experiments validate our approach, demonstrating not only its effectiveness in mitigating modality bias but also its significant role in minimizing hallucinations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — noise-aware preference optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zefeng Zhang , Hengzhu Tang , Jiawei Sheng , Zhenyu Zhang , Yiming Ren , Zhenyang Li , Dawei Yin , Duohe Ma , Tingwen Liu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Fairness Natural Language Processing > Resources & Methods > Large Language Models Deep Learning > Models > Large Language Models Deep Learning > Learning Types > Reinforcement Learning Deep Learning > Learning Types > Multi-Modal Learning

Keywords

direct preference optimization preference optimization multimodal large language model hallucination reduction noise-aware learning modality bia noise-aware preference optimization modality bias mitigation

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025