2020 INTERSPEECH INTERSPEECH 2020

On Synthesis for Supervised Monaural Speech Separation in Time Domain

Abstract

Time-domain approaches for speech separation have achieved great success recently. However, the sources separated by these time-domain approaches usually contain some artifacts (broadband noises), especially when separating mixture with noise. In this paper, we incorporate synthesis way into the time-domain speech separation approaches to deal with above broadband noises in separated sources, which can be seamlessly used in the speech separation system by a ‘plug-and-play’ way. By directly learning an estimation for each source in encoded domain, synthesis way can reduce artifacts in estimated speeches and improve the speech separation performance. Extensive experiments on different state-of-the-art models reveal that the synthesis way acquires the ability to handle with noisy mixture and is more suitable for noisy speech separation. On a new benchmark noisy dataset, the synthesis way obtains 0.97 dB (10.1%) SDR relative improvement and respective gains on various metrics without extra computation cost.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio
🧭 Keyword Pioneer — artifact reduction
🐣 Hot Topic Early Bird — source separation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio