Noise-robust Speech Separation with Fast Generative Correction

Helin Wang; Jesus Villalba; Laureano Moro-Velazquez; Jiarui Hai; Thomas Thebaud; Najim Dehak

2024 INTERSPEECH INTERSPEECH 2024

Noise-robust Speech Separation with Fast Generative Correction

Abstract

Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhance the output of a discriminative separator. By leveraging a generative corrector based on a diffusion model, we refine the separation process for single-channel mixture speech by removing noises and perceptually unnatural distortions. Furthermore, we optimize the generative model using a predictive loss to streamline the diffusion model’s reverse process into a single step and rectify any associated errors by the reverse process. Our method achieves state-of-the-art performance on the in-domain Libri2Mix noisy dataset, and out-of-domain WSJ with a variety of noises, improving SI-SNR by 22-35% relative to SepFormer, demonstrating robustness and strong generalization capabilities.

🧭 Keyword Pioneer — generative correction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Helin Wang , Jesus Villalba , Laureano Moro-Velazquez , Jiarui Hai , Thomas Thebaud , Najim Dehak

Topics

Speech & Audio > Synthesis > Speech Enhancement

Keywords

speech separation noise robustness diffusion model speech quality generative correction

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024