2023 INTERSPEECH INTERSPEECH 2023

Towards Multi-Lingual Audio Question Answering

Abstract

Audio Question Answering (AQA) is a multi-modal translation task where a system analyzes an audio signal and a natural language question to generate a desirable natural language answer. AQA has been primarily studied through the lens of the English language. However, addressing AQA in other languages, in the same manner, would require a considerable amount of resources. This paper proposes scalable solutions to multi-lingual audio question answering on both data and modeling fronts. We propose mClothoAQA, a translation-based multi-lingual AQA dataset in eight languages. The dataset consists of 1991 audio files and nearly 0.3 million question-answer pairs. Finally, we introduce a multi-lingual AQA model and demonstrate its strong performance in eight languages. The dataset and code can be accessed at https://github.com/swarupbehera/mAQA.

🌉 Interdisciplinary Bridge — Natural Language Processing and Speech & Audio
🐣 Hot Topic Early Bird — multilingual natural language processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio