Error Detection for Multimodal Classification

Thomas Bonnier

2025 NAACL NAACL 2025

Error Detection for Multimodal Classification

Abstract

AbstractMachine learning models have proven to be useful in various key applications such as autonomous driving or diagnosis prediction. When a model is implemented under real-world conditions, it is thus essential to detect potential errors with a trustworthy approach. This monitoring practice will render decision-making safer by avoiding catastrophic failures. In this paper, the focus is on multimodal classification. We introduce a method that addresses error detection based on unlabeled data. It leverages fused representations and computes the probability that a model will fail based on detected fault patterns in validation data. To improve transparency, we employ a sampling-based approximation of Shapley values in multimodal settings in order to explain why a prediction is assessed as erroneous in terms of feature values. Further, as explanation methods can sometimes disagree, we suggest evaluating the consistency of explanations produced by different value functions and algorithms. To show the relevance of our method, we measure it against a selection of 9 baselines from various domains on tabular-text and text-image datasets, and 2 multimodal fusion strategies for the classification models. Lastly, we show the usefulness of our explanation algorithm on misclassified samples.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio