Improving Phonetic Transcriptions of Children’s Speech by Pronunciation Modelling with Constrained CTC-Decoding

Lars Rumberg; Christopher Gebauer; Hanna Ehlert; Ulrike Lüdtke; Jorn Ostermann

2022 INTERSPEECH INTERSPEECH 2022

Improving Phonetic Transcriptions of Children’s Speech by Pronunciation Modelling with Constrained CTC-Decoding

Abstract

Language sample analysis (LSA) is a powerful tool for both therapeutic applications and research of child speech and language development. Nevertheless, it is not routinely used, due to the high cost of manual transcription and analysis. Assistance by automatic speech recognition for children has the potential to enable a wide-spread use of LSA. However, the development of modern speech recognition systems heavily relies on large scale datasets. Therefore, it faces the same obstacle of high cost for transcription as LSA itself. In this paper, we study how cheaply transcribed child speech, i. e., limited to an orthographic transcription, can be improved on a phonetic level by leveraging a CTC based automatic speech recognition model, trained on a small phonetically transcribed dataset. We constrain the CTC decoding by modeling variation of the pronunciation given the orthographic transcription using weighted finite state automata. Our experiments show that the transcription is improved in terms of phone error rate by relative 14% when applying our method. Additionally, we show how the improved transcript can in turn be leveraged to improve the training of a new model.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐣 Hot Topic Early Bird — constrained decoding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Lars Rumberg , Christopher Gebauer , Hanna Ehlert , Ulrike Lüdtke , Jorn Ostermann

Topics

Machine Learning > Core Methods > Representation Learning Speech & Audio > Recognition > Speech Recognition

Keywords

constrained decoding child speech phonetic transcription pronunciation modeling ctc decoding

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022