Evaluating Retrieval-Augmented Generation for Medication Question Answering on Nigerian Drug Labels in Yorùbá
Abstract
AbstractLarge Language Models (LLMs) have the potential to improve healthcare information access in Nigeria, but they risk generating unsafe or inaccurate responses when used in low-resource languages such as Yorùbá. Retrieval-Augmented Generation (RAG) has since emerged as a promising approach to mitigate hallucinations by grounding LLM outputs in verified knowledge sources. To assess its effectiveness in low-resource contexts, we construct a controlled Yorùbá QA dataset derived from Nigerian drug labels, comprising 460 question–answer pairs across 92 drugs, which was used to evaluate the impact of different retrieval strategies: hybrid lexical–semantic retrieval, Hypothetical Document Embeddings(HyDE), and Cross-Encoder re-ranking. Our results show that hybrid retrieval strategies, combining lexical and semantic signals, generally yield more reliable and clinically accurate responses, while other advanced re-ranking approaches show inconsistent improvements. These findings hereby underscore the importance of effective retrieval design for safe and trustworthy multilingual healthcare QA systems.