2021 INTERSPEECH INTERSPEECH 2021

Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation

Abstract

Cross-Lingual Voice Conversion (XVC) aims to modify a source speaker identity towards a target while preserving the source linguistic content. This paper introduces a cycle consistency loss on linguistic representation to ensure the speech content unchanged after conversion. The proposed XVC model consists of two loss functions during optimization: a spectral reconstruction loss and a linguistic cycle consistency loss. The cycle consistency loss seeks to maintain the source speech’s linguistic content. Specifically, we utilize Phonetic PosteriorGram (PPG) to represent the linguistic content. XVC experiments were conducted between English and Mandarin. Both objective and subjective evaluations demonstrated that with the proposed cycle consistency loss, converted speech is more intelligible.

🧭 Keyword Pioneer — cross-lingual voice conversion
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Natural Language Processing, Speech & Audio
🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio