2022
EMNLP
EMNLP 2022
Adversarial Text-to-Speech for low-resource languages
Abstract
AbstractIn this paper we propose a new method for training adversarial text-to-speech (TTS) models for low-resource languages using auxiliary data. Specifically, we modify the MelGAN (Kumar et al., 2019) architecture to achieve better performance in Arabic speech generation, exploring multiple additional datasets and architectural choices, which involved extra discriminators designed to exploit high-frequency similarities between languages. In our evaluation, we used subjective human evaluation, MOS-Mean Opinion Score, and a novel quantitative metric, the Fréchet Wav2Vec Distance, which we found to be well correlated with MOS. Both subjectively and quantitatively, our method outperformed the standard MelGAN model.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning and Speech & Audio
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Adversarial Learning
Speech & Audio > Synthesis > Text-to-Speech
Speech & Audio > Analysis > Speech Enhancement
Machine Learning > Learning Types > Multi-Modal Learning
Deep Learning > Learning Types > Adversarial Learning
Deep Learning > Learning Types > Generative Models
Speech & Audio > Synthesis > Speech Synthesis