A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation

Danny Websdale; Ben Milner

2017 INTERSPEECH INTERSPEECH 2017

A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation

Abstract

This work proposes and compares perceptually motivated loss functions for deep learning based binary mask estimation for speech separation. Previous loss functions have focused on maximising classification accuracy of mask estimation but we now propose loss functions that aim to maximise the hit minus false-alarm (HIT-FA) rate which is known to correlate more closely to speech intelligibility. The baseline loss function is binary cross-entropy (CE), a standard loss function used in binary mask estimation, which maximises classification accuracy. We propose first a loss function that maximises the HIT-FA rate instead of classification accuracy. We then propose a second loss function that is a hybrid between CE and HIT-FA, providing a balance between classification accuracy and HIT-FA rate. Evaluations of the perceptually motivated loss functions with the GRID database show improvements to HIT-FA rate and ESTOI across babble and factory noises. Further tests then explore application of the perceptually motivated loss functions to a larger vocabulary dataset.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Loss Functions

🧭 Keyword Pioneer — perceptually motivated loss

🐣 Hot Topic Early Bird — speech separation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio

Authors

Danny Websdale , Ben Milner

Topics

Machine Learning > Optimization & Theory > Loss Functions Deep Learning > Optimization & Theory > Loss Functions

Keywords

speech separation speech intelligibility binary mask estimation perceptually motivated loss hit false alarm rate

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017