2022 INTERSPEECH INTERSPEECH 2022

Memory-Efficient Multi-Step Speech Enhancement with Neural ODE

Abstract

Although deep learning-based models proposed in the past years have achieved remarkable results on the speech enhancement tasks, the existing multi-step denoising methods require a memory size proportional to the number of steps during training, which makes it difficult to apply to large models. In this paper, we propose a memory-efficient multi-step speech enhancement method that requires only constant amount of memory for model training. This End-to-End method combines Neural Ordinary Differential Equations (Neural ODEs) with the Memory-efficient Asynchronous Leapfrog Integrator (MALI) for multi-step training. Experiments on the Voice Bank and DEMAND datasets showed that the multi-step method using MALI had better performance than the single-step method, with maximum improvements of 0.16 on PESQ and 0.5% on STOI. In addition to reducing the memory required for model training, this method is also quite competitive with the current state-of-the-art methods.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — multi-step training
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio