2020 INTERSPEECH INTERSPEECH 2020

Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System

Abstract

The measurement of speech intelligibility (SI) still mainly relies on time-consuming and expensive subjective experiments because no versatile objective measure can predict SI. One promising candidate of an SI prediction method is an approach with a deep neural network (DNN)-based automatic speech recognition (ASR) system, due to its recent great advance. In this paper, we propose and evaluate SI prediction methods based on the posteriors of DNN-based ASR systems. Posteriors, which are the probabilities of phones given acoustic features, are derived using forced alignments between clean speech and a phone sequence. We evaluated some variations of the posteriors to improve the prediction performance. As a result of our experiments, a prediction method using a squared cumulative posterior probability achieved better accuracy than the conventional SI predictors based on well-established objective measures (STOI and eSTOI).

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — squared cumulative posterior
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio