2024 INTERSPEECH INTERSPEECH 2024

Locally Aligned Rectified Flow Model for Speech Enhancement Towards Single-Step Diffusion

Abstract

Diffusion models based on stochastic differential equations have been shown to be effective in speech enhancement, a task of recovering clean speech signals from noisy speech signals. However, these models are limited by computational complexity, mainly due to the large number of function evaluations required in the reverse diffusion process. To address this limitation, we propose the locally aligned rectified flow (LARF) model, a diffusion model based on ordinary differential equations that learns a transport mapping between the distributions of clean and noisy speech features. By introducing global and local flow matching losses, LARF restricts the transport mapping to be as straight as possible, resulting in a reduction in the number of function evaluations. In experiments, we demonstrate the effectiveness of LARF on the two speech enhancement datasets: WSJ0-CHiME3 and VoiceBank-DEMAND. On WSJ0-CHiME3, LARF achieved a PESQ of 2.95 and an SI-SDR of 19.3 with a single step.

πŸŒ‰ Interdisciplinary Bridge β€” Deep Learning and Machine Learning
🧭 Keyword Pioneer β€” single-step diffusion
🐣 Hot Topic Early Bird β€” rectified flow
🐝 Cross-Pollinator β€” Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio