AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Odunayo Ogundepo; Tajuddeen R. Gwadabe; Clara E. Rivera; Jonathan H. Clark; Sebastian Ruder; David Ifeoluwa Adelani; Bonaventure F. P. Dossou; Abdou Aziz Diop; Claytone Sikasote; Gilles Hacheme; Happy Buzaaba; Ignatius Ezeani; Rooweither Mabuya; Salomey Osei; Chris Emezue; Albert Njoroge Kahira; Shamsuddeen Hassan Muhammad; Akintunde Oladipo; Abraham Toluwase Owodunni; Atnafu Lambebo Tonja; Iyanuoluwa Shode; Akari Asai; Tunde Oluwaseyi Ajayi; Clemencia Siro; Steven Arthur; Mofetoluwa Adeyemi; Orevaoghene Ahia; Anuoluwapo Aremu; Oyinkansola Awosan; Chiamaka Chukwuneke; Bernard Opoku; Awokoya Ayodele; Verrah Otiende; Christine Mwase; Boyd Sinkala; Andre Niyongabo Rubungo; Daniel A. Ajisafe; Emeka Felix Onwuegbuzia; Habib Mbow; Emile Niyomutabazi; Eunice Mukonde; Falalu Ibrahim Lawan; Ibrahim Said Ahmad; Jesujoba O. Alabi; Martin Namukombo; Mbonu Chinedu; Mofya Phiri; Neo Putini; Ndumiso Mngoma; Priscilla A. Amouk; Ruqayya Nasir Iro; Sonia Adhiambo

2023 EMNLP EMNLP 2023

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Abstract

AbstractAfrican languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems – those that retrieve answer content from other languages while serving people in their native language—offer a means of filling this gap. To this end, we create Our Dataset, the first cross-lingual QA dataset with a focus on African languages. Our Dataset includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, Our Dataset focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, Our Dataset proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.

👥 Mega-Team — 52 authors

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — multilingual retrieval

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Odunayo Ogundepo , Tajuddeen R. Gwadabe , Clara E. Rivera , Jonathan H. Clark , Sebastian Ruder , David Ifeoluwa Adelani , Bonaventure F. P. Dossou , Abdou Aziz Diop , Claytone Sikasote , Gilles Hacheme , Happy Buzaaba , Ignatius Ezeani , Rooweither Mabuya , Salomey Osei , Chris Emezue , Albert Njoroge Kahira , Shamsuddeen Hassan Muhammad , Akintunde Oladipo , Abraham Toluwase Owodunni , Atnafu Lambebo Tonja , Iyanuoluwa Shode , Akari Asai , Tunde Oluwaseyi Ajayi , Clemencia Siro , Steven Arthur , Mofetoluwa Adeyemi , Orevaoghene Ahia , Anuoluwapo Aremu , Oyinkansola Awosan , Chiamaka Chukwuneke , Bernard Opoku , Awokoya Ayodele , Verrah Otiende , Christine Mwase , Boyd Sinkala , Andre Niyongabo Rubungo , Daniel A. Ajisafe , Emeka Felix Onwuegbuzia , Habib Mbow , Emile Niyomutabazi , Eunice Mukonde , Falalu Ibrahim Lawan , Ibrahim Said Ahmad , Jesujoba O. Alabi , Martin Namukombo , Mbonu Chinedu , Mofya Phiri , Neo Putini , Ndumiso Mngoma , Priscilla A. Amouk , Ruqayya Nasir Iro , Sonia Adhiambo

Topics

Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Applications > Question Answering Natural Language Processing > Resources & Methods > Multilingual NLP Machine Learning > Learning Types > Retrieval-Augmented Generation Artificial Intelligence > Core AI > Language

Keywords

multilingual nlp multilingual retrieval question answering information retrieval cross-lingual retrieval african language cross-lingual question answering open-retrieval question answering

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023