2024 INTERSPEECH INTERSPEECH 2024

QMixCAT: Unsupervised Speech Enhancement Using Quality-guided Signal Mixing and Competitive Alternating Model Training

Abstract

Most deep learning-based speech enhancement (SE) models are supervised, requiring pairs of mixture and clean speech for training. This poses great challenges for real-world SE applications. Addressing this limitation is very crucial. In this paper, we introduce QMixCAT, an innovative unsupervised SE approach that enables unsupervised mixtures to be trained in a supervised manner within a teacher-student framework. Specifically, we propose a quality-guided signal mixing (QMix) approach to generate noisy-mixture-based supervised training data. Then, the model is trained using these data in a teacher-student framework, iteratively incorporating the QMix process during each epoch. In addition, a competitive alternating model training (CAT) mechanism is further proposed to enhance the quality of both teacher and student models. Experimental results demonstrate that QMixCAT significantly outperforms the strong baselines across multiple evaluation metrics.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio