End-to-End Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks

Mathias B. Pedersen; Morten Kolbæk; Asger H. Andersen; Søren H. Jensen; Jesper Jensen

2020 INTERSPEECH INTERSPEECH 2020

End-to-End Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks

Abstract

Data-driven speech intelligibility prediction has been slow to take off. Datasets of measured speech intelligibility are scarce, and so current models are relatively small and rely on hand-picked features. Classical predictors based on psychoacoustic models and heuristics are still the state-of-the-art. This work proposes a U-Net inspired fully convolutional neural network architecture, NSIP, trained and tested on ten datasets to predict intelligibility of time-domain speech. The architecture is compared to a frequency domain data-driven predictor and to the classical state-of-the-art predictors STOI, ESTOI, HASPI and SIIB. The performance of NSIP is found to be superior for datasets seen in the training phase. On unseen datasets NSIP reaches performance comparable to classical predictors.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — time-domain speech

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Mathias B. Pedersen , Morten Kolbæk , Asger H. Andersen , Søren H. Jensen , Jesper Jensen

Topics

Machine Learning > Core Methods > Regression Deep Learning > Architectures > Neural Networks Speech & Audio > Analysis > Speech Analysis Deep Learning > Models > Neural Networks

Keywords

u-net architecture speech intelligibility fully convolutional neural network intelligibility prediction time-domain speech

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020