Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

Erica Cooper; Cheng-I Lai; Yusuke Yasuda; Junichi Yamagishi

2020 INTERSPEECH INTERSPEECH 2020

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

Abstract

Previous work on speaker adaptation for end-to-end speech synthesis still falls short in speaker similarity. We investigate an orthogonal approach to the current speaker adaptation paradigms, speaker augmentation, by creating artificial speakers and by taking advantage of low-quality data. The base Tacotron2 model is modified to account for the channel and dialect factors inherent in these corpora. In addition, we describe a warm-start training strategy that we adopted for Tacotron2 training. A large-scale listening test is conducted, and a distance metric is adopted to evaluate synthesis of dialects. This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach. Audio samples are available online1.

❓ The Questioner

🧭 Keyword Pioneer — end-to-end synthesis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Erica Cooper , Cheng-I Lai , Yusuke Yasuda , Junichi Yamagishi

Topics

Speech & Audio > Synthesis > Text-to-Speech

Keywords

speaker adaptation end-to-end synthesis neural network speaker augmentation

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020