2021 INTERSPEECH INTERSPEECH 2021

An Exemplar Selection Algorithm for Native-Nonnative Voice Conversion

Abstract

We present an algorithm for selecting exemplars for native-to-nonnative voice conversion (VC) using a Sparse, Anchor-Based Representation of speech (SABR). The algorithm uses phoneme labels and clustering to learn optimal exemplars when source and target speakers are affected by poor time alignment, as is common in in native-to-nonnative voice conversion. We evaluate the method on speech from the ARCTIC and L2-ARCTIC corpora and compare it to a baseline exemplar-based VC algorithm. The proposed algorithm significantly improves synthesis quality and more than doubles that of a baseline exemplar-based VC system while using two orders of magnitude fewer atoms. Additionally, the proposed algorithm significantly reduces the VC error and improves the synthesis quality as compared to unoptimized SABR models. We discuss the implications of both optimization algorithms for SABR and broader exemplar-based VC systems.Index terms should be included as shown below.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Speech & Audio
🧭 Keyword Pioneer — anchor-based representation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio