Optimizing Speech Recognition Evaluation Using Stratified Sampling

Janne Pylkkönen; Thomas Drugman; Max Bisani

2016 INTERSPEECH INTERSPEECH 2016

Optimizing Speech Recognition Evaluation Using Stratified Sampling

Abstract

Producing large enough quantities of high-quality transcriptions for accurate and reliable evaluation of an automatic speech recognition (ASR) system can be costly. It is therefore desirable to minimize the manual transcription work for producing metrics with an agreed precision. In this paper we demonstrate how to improve ASR evaluation precision using stratified sampling. We show that by altering the sampling, the deviations observed in the error metrics can be reduced by up to 30% compared to random sampling, or alternatively, the same precision can be obtained on about 30% smaller datasets. We compare different variants for conducting stratified sampling, including a novel sample allocation scheme tailored for word error rate. Experimental evidence is provided to assess the effect of different sampling schemes to evaluation precision.

🚀 Conference Pioneer — INTERSPEECH 2016

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — transcription quality

🐣 Hot Topic Early Bird — word error rate

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

📈 Trend Setter — Evaluation