Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Vikramjit Mitra; Zifang Huang; Colin Lea; Lauren Tooley; Sarah Wu; Darren Botten; Ashwini Palekar; Shrinath Thelapurath; Panayiotis Georgiou; Sachin Kajarekar; Jefferey Bigham

2021 INTERSPEECH INTERSPEECH 2021

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Abstract

Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech recognition system on individuals who stutter and production-oriented approaches for improving performance for common voice assistant tasks (i.e., “what is the weather?”). At baseline, this system introduces a significant number of insertion and substitution errors resulting in intended speech Word Error Rates (isWER) that are 13.64% worse (absolute) for individuals with fluency disorders. We show that by simply tuning the decoding parameters in an existing hybrid speech recognition system one can improve isWER by 24% (relative) for individuals with fluency disorders. Tuning these parameters translates to 3.6% better domain recognition and 1.7% better intent recognition relative to the default setup for the 18 study participants across all stuttering severities.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — dysfluent speech

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Vikramjit Mitra , Zifang Huang , Colin Lea , Lauren Tooley , Sarah Wu , Darren Botten , Ashwini Palekar , Shrinath Thelapurath , Panayiotis Georgiou , Sachin Kajarekar , Jefferey Bigham

Topics

Machine Learning > Optimization & Theory > Optimization Speech & Audio > Recognition > Speech Recognition

Keywords

speech recognition word error rate voice assistant dysfluent speech decoding parameter tuning

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021