Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks

Eileen Pan; Anna Seo Gyeong Choi; Maartje Ter Hoeve; Skyler Seto; Allison Koenecke

2025 EMNLP EMNLP 2025

Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks

Abstract

AbstractLarge language models (LLMs) are ubiquitous in modern day natural language processing. However, previous work has shown degraded LLM performance for under-represented English dialects. We analyze the effects of typifying “standard” American English language questions as non-”standard” dialectal variants on multiple choice question answering tasks and find up to a 20% reduction in accuracy. Additionally, we investigate the grammatical basis of under-performance in non-”standard” English questions. We find that individual grammatical rules have varied effects on performance, but some are more consequential than others: three specific grammar rules (existential “it”, zero copula, and y’all) can explain the majority of performance degradation observed in multiple dialects. We call for future work to investigate bias mitigation methods focused on individual, high-impact grammatical structures.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Eileen Pan , Anna Seo Gyeong Choi , Maartje Ter Hoeve , Skyler Seto , Allison Koenecke

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Fairness Natural Language Processing > Applications > Text Classification

Keywords

language model evaluation reasoning benchmark fairness in nlp dialect bia grammatical analysis

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025