2017 INTERSPEECH INTERSPEECH 2017

Areal and Phylogenetic Features for Multilingual Speech Synthesis

Abstract

We introduce phylogenetic and areal language features to the domain of multilingual text-to-speech synthesis. Intuitively, enriching the existing universal phonetic features with cross-lingual shared representations should benefit the multilingual acoustic models and help to address issues like data scarcity for low-resource languages. We investigate these representations using the acoustic models based on long short-term memory recurrent neural networks. Subjective evaluations conducted on eight languages from diverse language families show that sometimes phylogenetic and areal representations lead to significant multilingual synthesis quality improvements. To help better leverage these novel features, improving the baseline phonetic representation may be necessary.

πŸŒ‰ Interdisciplinary Bridge β€” Artificial Intelligence and Deep Learning and Machine Learning
🧭 Keyword Pioneer β€” multilingual text-to-speech
🐣 Hot Topic Early Bird β€” low-resource language
🐝 Cross-Pollinator β€” Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio