Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System

Kai Fan; Bo Li; Jiayi Wang; Shiliang Zhang; Boxing Chen; Niyu Ge; Zhijie Yan

2020 INTERSPEECH INTERSPEECH 2020

Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System

Abstract

The performances of automatic speech recognition (ASR) systems are usually evaluated by the metric word error rate (WER) when the manually transcribed data are provided, which are, however, expensively available in the real scenario. In addition, the empirical distribution of WER for most ASR systems usually tends to put a significant mass near zero, making it difficult to simulate with a single continuous distribution. In order to address the two issues of ASR quality estimation (QE), we propose a novel neural zero-inflated model to predict the WER of the ASR result without transcripts. We design a neural zero-inflated beta regression on top of a bidirectional transformer language model conditional on speech features (speech-BERT). We adopt the pre-training strategy of token level masked language modeling for speech-BERT as well, and further fine-tune with our zero-inflated layer for the mixture of discrete and continuous outputs. The experimental results show that our approach achieves better performance on WER prediction compared with strong baselines.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — pre-training strategy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Kai Fan , Bo Li , Jiayi Wang , Shiliang Zhang , Boxing Chen , Niyu Ge , Zhijie Yan

Topics

Machine Learning > Core Methods > Regression Speech & Audio > Recognition > Automatic Speech Recognition Deep Learning > Models > Transformers

Keywords

automatic speech recognition quality estimation bidirectional transformer word error rate pre-training strategy zero-inflated model

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020