BEnQA: A Question Answering Benchmark for Bengali and English

Sheikh Shafayat; H M Quamran Hasan; Minhajur Rahman Chowdhury Mahim; Rifki Afina Putri; James Thorne; Alice Oh

2024 ACL ACL 2024

BEnQA: A Question Answering Benchmark for Bengali and English

Abstract

AbstractIn this study, we introduce BEnQA, a dataset comprising parallel Bengali and English exam questions for middle and high school levels in Bangladesh. Our dataset consists of approximately 5K questions covering several subjects in science with different types of questions, including factual, application, and reasoning-based questions. We benchmark several Large Language Models (LLMs) with our parallel dataset and observe a notable performance disparity between the models in Bengali and English. We also investigate some prompting methods, and find that Chain-of-Thought prompting is beneficial mostly on reasoning questions, but not so much on factual ones. We also find that appending English translation helps to answer questions in Bengali. Our findings point to promising future research directions for improving the performance of LLMs in Bengali and more generally in low-resource languages.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sheikh Shafayat , H M Quamran Hasan , Minhajur Rahman Chowdhury Mahim , Rifki Afina Putri , James Thorne , Alice Oh

Topics

Natural Language Processing > Applications > Question Answering Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Multilingual NLP Machine Learning > Learning Types > Prompt Engineering

Keywords

multilingual nlp question answering low-resource language chain-of-thought prompting large language model

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024