Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech

Jilt Sebastian; Manoj Kumar; D. S. Pavan Kumar; Mathew Magimai.-Doss; Hema Murthy; Shrikanth Narayanan

2018 INTERSPEECH INTERSPEECH 2018

Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech

Abstract

This paper presents a raw-waveform neural network and uses it along with a denoising network for clustering in weakly-supervised learning scenarios under extreme noise conditions. Specifically, we consider language independent gender identification on a set of varied noise conditions and signal to noise ratios (SNRs). We formulate the denoising problem as a source separation task and train the system using a discriminative criterion in order to enhance output SNRs. A denoising recurrent neural network (RNN) is first trained on a small subset (roughly one-fifth) of the data for learning a speech-specific mask. The denoised speech signal is then directly fed as input to a raw-waveform convolutional neural network (CNN) trained with denoised speech. We evaluate the standalone performance of denoiser in terms of various signal-to-noise measures and discuss its contribution towards robust gender identification. An absolute improvement of 11.06% and 13.33% is achieved by the combined pipeline over the i-vector SVM baseline system for 0 dB and -5 dB SNR conditions, respectively. We further analyse the information captured by the first CNN layer in both noisy and denoised speech.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

📈 Trend Setter — Audio Processing

🧭 Keyword Pioneer — denoising network

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jilt Sebastian , Manoj Kumar , D. S. Pavan Kumar , Mathew Magimai.-Doss , Hema Murthy , Shrikanth Narayanan

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Computer Vision > Processing > Audio Processing

Keywords

weakly supervised learning convolutional neural network recurrent neural network gender identification raw waveform denoising network

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018