2023 INTERSPEECH INTERSPEECH 2023

Rethinking Complex-Valued Deep Neural Networks for Monaural Speech Enhancement

Abstract

Despite efforts made to adopt complex-valued deep neural networks (CVDNNs), it remains unclear whether CVDNNs are generally more effective than real-valued DNNs (RVDNNs) for speech enhancement. This study systematically examines CVDNNs against their real-valued counterparts in monaural scenarios. We first investigate atomic units of CVDNNs against those of RVDNNs. We find the use of complex-valued operations hinders model capacity when model size is small. Moreover, we show that two notable CVDNNs, deep complex convolutional recurrent network (DCCRN) and deep complex U-Net (DCUNET), produce identical performance to their real-valued counterparts while requiring more computation. Our experimental results show that those CVDNNs do not provide a performance gain over RVDNNs for monaural speech enhancement, and are less desirable due to higher computational cost. This study suggests that it is more than nontrivial to rethink the efficacy of CVDNNs for speech enhancement.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio
🧭 Keyword Pioneer — real-valued neural network
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio