2019 INTERSPEECH INTERSPEECH 2019

An Improved Goodness of Pronunciation (GoP) Measure for Pronunciation Evaluation with DNN-HMM System Considering HMM Transition Probabilities

Abstract

Goodness of pronunciation (GoP) is typically formulated with Gaussian mixture model-hidden Markov model (GMM-HMM) based acoustic models considering HMM state transition probabilities (STPs) and GMM likelihoods of context dependent phonemes. On the other hand, deep neural network (DNN)-HMM based acoustic models employed sub-phonemic (senone) posteriors instead of GMM likelihoods along with STPs. However, each senone is shared across many states; thus, there is no one-to-one correspondence between them. In order to circumvent this, most of the existing works have proposed modifications to the GoP formulation considering only posteriors neglecting the STPs. In this work, we derive a formulation for the GoP and it results in the formulation involving both senone posteriors and STPs. Further, we illustrate the steps to implement the proposed GoP formulation in Kaldi, a state-of-the-art automatic speech recognition toolkit. Experiments are conducted on English data collected from Indian speakers using acoustic models trained with native English data from LibriSpeech and Fisher-English corpora. The highest improvement in the correlation coefficient between the scores from the formulations and the expert ratings is found to be 14.89% (relative) better with the proposed approach compared to the best of the existing formulations that don’t include STPs.

🧭 Keyword Pioneer — dnn-hmm system
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio