NativQA: Multilingual Culturally-Aligned Natural Query for LLMs

Md. Arid Hasan; Maram Hasanain; Fatema Ahmad; Sahinur Rahman Laskar; Sunaya Upadhyay; Vrunda N Sukhadia; Mucahid Kutlu; Shammur Absar Chowdhury; Firoj Alam

2025 ACL ACL 2025

NativQA: Multilingual Culturally-Aligned Natural Query for LLMs

Abstract

AbstractNatural Question Answering (QA) datasets play a crucial role in evaluating the capabilities of large language models (LLMs), ensuring their effectiveness in real-world applications. Despite the numerous QA datasets that have been developed and some work done in parallel, there is a notable lack of a framework and large-scale region-specific datasets queried by native users in their own languages. This gap hinders effective benchmarking and the development of fine-tuned models for regional and cultural specificities. In this study, we propose a scalable, language-independent framework, NativQA, to seamlessly construct culturally and regionally aligned QA datasets in native languages for LLM evaluation and tuning. We demonstrate the efficacy of the proposed framework by designing a multilingual natural QA dataset, MultiNativQA, consisting of approximately ~64K manually annotated QA pairs in seven languages, ranging from high- to extremely low-resource, based on queries from native speakers from 9 regions covering 18 topics. We benchmark both open- and closed-source LLMs using the MultiNativQA dataset. The dataset and related experimental scripts are publicly available for the community at: https://huggingface.co/datasets/QCRI/MultiNativQAand https://gitlab.com/nativqa/multinativqa.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Md. Arid Hasan , Maram Hasanain , Fatema Ahmad , Sahinur Rahman Laskar , Sunaya Upadhyay , Vrunda N Sukhadia , Mucahid Kutlu , Shammur Absar Chowdhury , Firoj Alam

Topics

Natural Language Processing > Applications > Question Answering Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

benchmark dataset native language cultural alignment multilingual question answering large language model

Download PDF

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights 2025

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision 2025

Structural Deep Encoding for Table Question Answering 2025

Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating 2025

NativQA: Multilingual Culturally-Aligned Natural Query for LLMs

Abstract

Authors

Topics

Keywords

Related papers