FunghiFunghi at SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

Tariq Ballout; Pieter Jansma; Nander Koops; Yong Hui Zhou

2025 ACL ACL 2025

FunghiFunghi at SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

Abstract

AbstractLarge Language Models (LLMs) often generate hallucinated content, which is factually incorrect or misleading, posing reliability challenges. The Mu-SHROOM shared task addresses hallucination detection in multilingualLLM-generated text. This study employsSpanBERT, a transformer model optimized forspan-based predictions, to identify hallucinatedspans across multiple languages. To addresslimited training data, we apply dataset augmentation through translation and synthetic generation. The model is evaluated using Intersection over Union (IoU) for span detectionand Spearman’s correlation for ranking consistency. While the model detects hallucinatedspans with moderate accuracy, it struggles withranking confidence scores. These findings highlight the need for improved probability calibration and multilingual robustness. Future workshould refine ranking methods and explore ensemble models for better performance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio