2024 INTERSPEECH INTERSPEECH 2024

SWiBE: A Parameterized Stochastic Diffusion Process for Noise-Robust Bandwidth Expansion

Abstract

Speech recordings frequently encounter a variety of distortions, making the task of eliminating them essential yet challenging. In this study, leveraging the current success of score-based generative modeling (SGM), we propose a novel noise-robust bandwidth expansion (BWE) framework based on an innovative parameterized stochastic diffusion process, achieved through stepwise bandwidth expansion in the spectrogram. Our proposed Step-Wised Bandwidth Expansion (SWiBE) method outperforms baseline approaches over considered metrics, including the current state-of-the-art noise-robust BWE model and various diffusion and GAN-based models. Moreover, we analyze the interaction between the hyperparameters and performance across different aspects including perceptual quality and spectral reconstruction. Our findings reveal that the score-based model manifests distinct characteristics under varying parameterizations.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio