CVC: Contrastive Learning for Non-Parallel Voice Conversion

Tingle Li; Yichen Liu; Chenxu Hu; Hang Zhao

2021 INTERSPEECH INTERSPEECH 2021

CVC: Contrastive Learning for Non-Parallel Voice Conversion

Abstract

Cycle consistent generative adversarial network (CycleGAN) and variational autoencoder (VAE) based models have gained popularity in non-parallel voice conversion recently. However, they often suffer from difficult training process and unsatisfactory results. In this paper, we propose a contrastive learning-based adversarial approach for voice conversion, namely contrastive voice conversion (CVC). Compared to previous CycleGAN-based methods, CVC only requires an efficient one-way GAN training by taking the advantage of contrastive learning. When it comes to non-parallel one-to-one voice conversion, CVC is on par or better than CycleGAN and VAE while effectively reducing training time. CVC further demonstrates superior performance in many-to-one voice conversion, enabling the conversion from unseen speakers.

🐣 Hot Topic Early Bird — contrastive learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

Authors

Tingle Li , Yichen Liu , Chenxu Hu , Hang Zhao

Topics

Speech & Audio > Synthesis > Speech Enhancement Speech & Audio > Analysis > Speaker Verification Deep Learning > Learning Types > Contrastive Learning

Keywords

contrastive learning voice conversion generative adversarial network cycle consistency speaker identity speaker similarity non-parallel training

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021