Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System

Kenichi Arai; Shoko Araki; Atsunori Ogawa; Keisuke Kinoshita; Tomohiro Nakatani; Toshio Irino

2020 INTERSPEECH INTERSPEECH 2020

Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System

Abstract

The measurement of speech intelligibility (SI) still mainly relies on time-consuming and expensive subjective experiments because no versatile objective measure can predict SI. One promising candidate of an SI prediction method is an approach with a deep neural network (DNN)-based automatic speech recognition (ASR) system, due to its recent great advance. In this paper, we propose and evaluate SI prediction methods based on the posteriors of DNN-based ASR systems. Posteriors, which are the probabilities of phones given acoustic features, are derived using forced alignments between clean speech and a phone sequence. We evaluated some variations of the posteriors to improve the prediction performance. As a result of our experiments, a prediction method using a squared cumulative posterior probability achieved better accuracy than the conventional SI predictors based on well-established objective measures (STOI and eSTOI).

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — squared cumulative posterior

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kenichi Arai , Shoko Araki , Atsunori Ogawa , Keisuke Kinoshita , Tomohiro Nakatani , Toshio Irino

Topics

Machine Learning > Core Methods > Regression Deep Learning > Architectures > Neural Networks

Keywords

automatic speech recognition posterior probability deep neural network speech intelligibility prediction objective measure squared cumulative posterior

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020