2025 ACL ACL 2025

SingaKids: A Multilingual Multimodal Dialogic Tutor for Language Learning

Abstract

AbstractThe integration of generative artificial intelligence into educational applications has enhanced personalized and interactive learning experiences, and it shows strong potential to promote young learners language acquisition. However, it is still challenging to ensure consistent and robust performance across different languages and cultural contexts, and kids-friendly design requires simplified instructions, engaging interactions, and age-appropriate scaffolding to maintain motivation and optimize learning outcomes.In this work, we introduce SingaKids, a dialogic tutor designed to facilitate language learning through picture description tasks. Our system integrates dense image captioning, multilingual dialogic interaction, speech understanding, and engaging speech generation to create an immersive learning environment in four languages: English, Mandarin, Malay, and Tamil. We further improve the system through multilingual pre-training, task-specific tuning, and scaffolding optimization. Empirical studies with elementary school students demonstrate that SingaKids provides effective dialogic teaching, benefiting learners at different performance levels.

🧭 Keyword Pioneer — generative artificial intelligence
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning
🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Interdisciplinary and Natural Language Processing and Speech & Audio