Speech Enhancement with Weakly Labelled Data from AudioSet

Qiuqiang Kong; Haohe Liu; Xingjian Du; Li Chen; Rui Xia; Yuxuan Wang

2021 INTERSPEECH INTERSPEECH 2021

Speech Enhancement with Weakly Labelled Data from AudioSet

Abstract

Speech enhancement is a task to improve the intelligibility and perceptual quality of degraded speech signals. Recently, neural network-based methods have been applied to speech enhancement. However, many neural network-based methods require users to collect clean speech and background noise for training, which can be time-consuming. In addition, speech enhancement systems trained on particular types of background noise may not generalize well to a wide range of noise. To tackle those problems, we propose a speech enhancement framework trained on weakly labelled data. We first apply a pretrained sound event detection system to detect anchor segments that contain sound events in audio clips. Then, we randomly mix two detected anchor segments as a mixture. We build a conditional source separation network using the mixture and a conditional vector as input. The conditional vector is obtained from the audio tagging predictions on the anchor segments. In inference, we input a noisy speech signal with the one-hot encoding of “Speech” as a condition to the trained system to predict enhanced speech. Our system achieves a PESQ of 2.28 and an SSNR of 8.75 dB on the VoiceBank-DEMAND dataset, outperforming the previous SEGAN system of 2.16 and 7.73 dB respectively.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — conditional source separation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qiuqiang Kong , Haohe Liu , Xingjian Du , Li Chen , Rui Xia , Yuxuan Wang

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Speech & Audio > Synthesis > Speech Enhancement Deep Learning > Learning Types > Self-Supervised Learning

Keywords

source separation weakly supervised learning speech enhancement audio tagging neural network conditional source separation weakly labelled datum

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021