Multi-Scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken Interactions

Manoj Kumar; Daniel Bone; Kelly McWilliams; Shanna Williams; Thomas D. Lyon; Shrikanth S. Narayanan

2017 INTERSPEECH INTERSPEECH 2017

Multi-Scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken Interactions

Abstract

The mutual influence of participant behavior in a dyadic interaction has been studied for different modalities and quantified by computational models. In this paper, we consider the task of automatic recognition for children’s speech, in the context of child-adult spoken interactions during interviews of children suspected to have been maltreated. Our long-term goal is to provide insights within this immensely important, sensitive domain through large-scale lexical and paralinguistic analysis. We demonstrate improvement in child speech recognition accuracy by conditioning on both the domain and the interlocutor’s (adult) speech. Specifically, we use information from the automatic speech recognizer outputs of the adult’s speech, for which we have more reliable estimates, to modify the recognition system of child’s speech in an unsupervised manner. By learning first at session level, and then at the utterance level, we demonstrate an absolute improvement of upto 28% WER and 55% perplexity over the baseline results. We also report results of a parallel human speech recognition (HSR) experiment where annotators are asked to transcribe child’s speech under two conditions: with and without contextual speech information. Demonstrated ASR improvements and the HSR experiment illustrate the importance of context in aiding child speech recognition, whether by humans or computers.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — child speech recognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Manoj Kumar , Daniel Bone , Kelly McWilliams , Shanna Williams , Thomas D. Lyon , Shrikanth S. Narayanan

Topics

Machine Learning > Application Areas > Domain Adaptation Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

automatic speech recognition word error rate child speech recognition context adaptation

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017