QMixCAT: Unsupervised Speech Enhancement Using Quality-guided Signal Mixing and Competitive Alternating Model Training

Shilin Wang; Haixin Guan; Yanhua Long

2024 INTERSPEECH INTERSPEECH 2024

QMixCAT: Unsupervised Speech Enhancement Using Quality-guided Signal Mixing and Competitive Alternating Model Training

Abstract

Most deep learning-based speech enhancement (SE) models are supervised, requiring pairs of mixture and clean speech for training. This poses great challenges for real-world SE applications. Addressing this limitation is very crucial. In this paper, we introduce QMixCAT, an innovative unsupervised SE approach that enables unsupervised mixtures to be trained in a supervised manner within a teacher-student framework. Specifically, we propose a quality-guided signal mixing (QMix) approach to generate noisy-mixture-based supervised training data. Then, the model is trained using these data in a teacher-student framework, iteratively incorporating the QMix process during each epoch. In addition, a competitive alternating model training (CAT) mechanism is further proposed to enhance the quality of both teacher and student models. Experimental results demonstrate that QMixCAT significantly outperforms the strong baselines across multiple evaluation metrics.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shilin Wang , Haixin Guan , Yanhua Long

Topics

Machine Learning > Learning Types > Unsupervised Learning

Keywords

unsupervised learning speech enhancement teacher-student framework

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024