Dialectal Toxicity Detection: Evaluating LLM-as-a-Judge Consistency Across Language Varieties

Fahim Faisal; Md Mushfiqur Rahman; Antonios Anastasopoulos

2025 EMNLP EMNLP 2025

Dialectal Toxicity Detection: Evaluating LLM-as-a-Judge Consistency Across Language Varieties

Abstract

AbstractThere has been little systematic study on how dialectal differences affect toxicity detection by modern LLMs. Furthermore, although using LLMs as evaluators (“LLM-as-a-judge”) is a growing research area, their sensitivity to dialectal nuances is still underexplored and requires more focused attention. In this paper, we address these gaps through a comprehensive toxicity evaluation of LLMs across diverse dialects. We create a multi-dialect dataset through synthetic transformations and human-assisted translations, covering 10 language clusters and 60 varieties. We then evaluate five LLMs on their ability to assess toxicity, measuring multilingual, dialectal, and LLM-human consistency. Our findings show that LLMs are sensitive to both dialectal shifts and low-resource multilingual variation, though the most persistent challenge remains aligning their predictions with human judgments.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — dialectal toxicity detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Fahim Faisal , Md Mushfiqur Rahman , Antonios Anastasopoulos

Topics

Artificial Intelligence > Core AI > Interpretability Interdisciplinary > Social > Affective Computing Natural Language Processing > Applications > Sentiment Analysis Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Fairness

Keywords

multilingual nlp human alignment language variety dialectal toxicity detection multilingual variation human judgment alignment dialectal toxicity

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025