2021 INTERSPEECH INTERSPEECH 2021

Voice Privacy Through x-Vector and CycleGAN-Based Anonymization

Abstract

With the rise in usage of voice assistants and spoken language interfaces, important concerns regarding voice data privacy have been prompted. In an attempt to reduce the threat of attacks on voice data, in this paper, we propose a speaker anonymization system based on CycleGAN. This method modifies the speaker’s gender and accent information from the original speech signal. The proposed method gives a more natural-sounding anonymized voice in addition to a de-identified speaker. We have chosen baseline-1 of The Voice Privacy Challenge-2020 as our baseline system. Training of CycleGAN, ASR, and ASV experiments are performed on the subset of Librispeech corpus. In this paper, the double anonymization technique is also explored in which the CycleGAN-based anonymization technique is adopted on top of the baseline system. Experimental results show that combining the proposed method with the x-vector and neural source-filter (NSF) model-based method (baseline system) gives up to 5.61% relative improvement in EER of original-anonymized, enroll-trial pairs. However, it gives up to 19.30% relative improvement in EER for anonymized-anonymized enroll-trial pairs. We observed that along with the good speaker de-identification, the anonymized utterances have adequate speech intelligibility and naturalness.

🌉 Interdisciplinary Bridge — Machine Learning and Security & Privacy and Speech & Audio
🧭 Keyword Pioneer — neural source-filter
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Security & Privacy, Speech & Audio