From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge

Nafis Chowdhury; Moinul Haque; Anika Ahmed; Nazia Tasnim; Md. Istiak Hossain Shihab; Sajjadur Rahman; Farig Sadeque

2025 AACL AACL 2025

From Facts to Folklore: Evaluating Large Language Models on Bengali Cultural Knowledge

Abstract

AbstractRecent progress in NLP research has demonstrated remarkable capabilities of large language models (LLMs) across a wide range of tasks. While recent multilingual benchmarks have advanced cultural evaluation for LLMs, critical gaps remain in capturing the nuances of low-resource cultures. Our work addresses these limitations through a Bengali Language Cultural Knowledge (BLanCK) dataset including folk traditions, culinary arts, and regional dialects. Our investigation of several multilingual language models shows that while these models perform well in non-cultural categories, they struggle significantly with cultural knowledge and performance improves substantially across all models when context is provided, emphasizing context-aware architectures and culturally curated training data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — cultural evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nafis Chowdhury , Moinul Haque , Anika Ahmed , Nazia Tasnim , Md. Istiak Hossain Shihab , Sajjadur Rahman , Farig Sadeque

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Interpretability Machine Learning > Application Areas > Domain Adaptation Machine Learning > Application Areas > Fairness Natural Language Processing > Applications > Fact-Checking Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

benchmark evaluation multilingual nlp low-resource language cultural knowledge multilingual model cultural evaluation bengali language large language model context-aware architecture low-resource culture folk tradition

Download PDF

Related papers

Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge 2025

Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems 2025

Enhancing Training Data Quality through Influence Scores for Generalizable Classification: A Case Study on Sexism Detection 2025

CtrlShift: Steering Language Models for Dense Quotation Retrieval with Dynamic Prompts 2025

A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics 2025