2020
INTERSPEECH
INTERSPEECH 2020
Improved Zero-Shot Voice Conversion Using Explicit Conditioning Signals
Abstract
In this paper, we propose a zero-shot voice conversion algorithm adding a number of conditioning signals to explicitly transfer prosody, linguistic content, and dynamics to conversion results. We show that the proposed approach improves overall conversion quality and generalization to out-of-domain samples relative to a baseline implementation of AutoVC, as the inclusion of conditioning signals can help reduce the burden on the model’s encoder to implicitly learn all of the different aspects involved in speech production. An ablation analysis illustrates the effectiveness of the proposed method.
🧭
Keyword Pioneer
— linguistic content
🐣
Hot Topic Early Bird
— zero-shot learning
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio
🌉
Interdisciplinary Bridge
— Machine Learning and Speech & Audio