Text-to-speech system for low-resource languages: A case study in Shipibo-Konibo (a Panoan language from Peru)

Daniel Menendez; Hector Gomez

2025 NAACL NAACL 2025

Text-to-speech system for low-resource languages: A case study in Shipibo-Konibo (a Panoan language from Peru)

Abstract

AbstractThis paper presents the design and development of a Text-to-Speech (TTS) model for Shipibo-Konibo, a low-resource indigenous language spoken mainly in the Peruvian Amazon. Despite the challenge posed by the scarcity of data, the model was trained with over 4 hours of recordings and 3,025 meticulously collected written sentences. The tests results demon strated an intelligibility rate (IR) exceeding 88% and a mean opinion score (MOS) of 4.01, confirming the quality of the audio generated by the model, which comprises the Tacotron 2 spectrogram predictor and the HiFi-GAN vocoder. Furthermore, the potential of this model to be trained in other indigenous languages spoken in Peru is highlighted, opening a promising avenue for the documentation and revitalization of these languages.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — spectrogram predictor

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio