An Exemplar Selection Algorithm for Native-Nonnative Voice Conversion

Christopher Liberatore; Ricardo Gutierrez-Osuna

2021 INTERSPEECH INTERSPEECH 2021

An Exemplar Selection Algorithm for Native-Nonnative Voice Conversion

Abstract

We present an algorithm for selecting exemplars for native-to-nonnative voice conversion (VC) using a Sparse, Anchor-Based Representation of speech (SABR). The algorithm uses phoneme labels and clustering to learn optimal exemplars when source and target speakers are affected by poor time alignment, as is common in in native-to-nonnative voice conversion. We evaluate the method on speech from the ARCTIC and L2-ARCTIC corpora and compare it to a baseline exemplar-based VC algorithm. The proposed algorithm significantly improves synthesis quality and more than doubles that of a baseline exemplar-based VC system while using two orders of magnitude fewer atoms. Additionally, the proposed algorithm significantly reduces the VC error and improves the synthesis quality as compared to unoptimized SABR models. We discuss the implications of both optimization algorithms for SABR and broader exemplar-based VC systems.Index terms should be included as shown below.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — anchor-based representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Christopher Liberatore , Ricardo Gutierrez-Osuna

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Core Methods > Representation Learning Machine Learning > Core Methods > Feature Selection Speech & Audio > Analysis > Speech Analysis

Keywords

sparse representation voice conversion exemplar selection phoneme alignment exemplar-based method anchor-based representation phoneme labeling

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021