2024 INTERSPEECH INTERSPEECH 2024

DNN-based monaural speech enhancement using alternate analysis windows for phase and magnitude modification

Abstract

In recent decades, considerable research has been devoted to speech enhancement leveraging the short-term Fourier transform (STFT) analysis. As speech processing technology evolves, the significance of phase information in enhancing speech intelligibility becomes more noticeable. Typically, the Hanning window has been widely employed as analysis window in STFT. In this study, we propose the Chebyshev window for phase analysis, and the Hanning window for magnitude analysis. Next, we introduce a novel cepstral domain enhancement approach designed to robustly reinforce the harmonic structure of speech. The performance of our model is evaluated using the DNS challenge test set as well as the naturalistic APOLLO Fearless Steps evaluation set. Experimental results demonstrate that the Chebyshev-based phase solution outperforms the Hanning option for in phase-aware speech enhancement. Furthermore, the incorporation of quefrency emphasis proves effective in enhancing overall speech quality.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio
🧭 Keyword Pioneer — short-term fourier transform
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio