Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

Korbinian Kuhn; Verena Kersken; Gottfried Zimmermann

2024 INTERSPEECH INTERSPEECH 2024

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

Abstract

The Word Error Rate (WER) is the common measure of accuracy for Automatic Speech Recognition (ASR). Transcripts are usually pre-processed by substituting specific characters to account for non-semantic differences. As a result of this normalisation, information on the accuracy of punctuation or capitalisation is lost. We present a non-destructive, token-based approach using an extended Levenshtein distance algorithm to compute a robust WER and additional orthographic metrics. Transcription errors are also classified more granularly by existing string similarity and phonetic algorithms. An evaluation on several datasets demonstrates the practical equivalence of our approach compared to common WER computations. We also provide an exemplary analysis of derived use cases, such as a punctuation error rate, and a web application for interactive use and visualisation of our implementation. The code is available open-source.

🧭 Keyword Pioneer — string similarity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Korbinian Kuhn , Verena Kersken , Gottfried Zimmermann

Topics

Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

automatic speech recognition error classification word error rate levenshtein distance string similarity orthographic metric

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024