Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition

Shansong Liu; Shoukang Hu; Yi Wang; Jianwei Yu; Rongfeng Su; Xunying Liu; Helen Meng

2019 INTERSPEECH INTERSPEECH 2019

Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition

Abstract

Automatic speech recognition (ASR) for disordered speech is a challenging task. People with speech disorders such as dysarthria often have physical disabilities, leading to severe degradation of speech quality, highly variable voice characteristics and large mismatch against normal speech. It is also difficult to record large amounts of high quality audio-visual data for developing audio-visual speech recognition (AVSR) systems. To address these issues, a novel Bayesian gated neural network (BGNN) based AVSR approach is proposed. Speaker level Bayesian gated control of contributions from visual features allows a more robust fusion of audio and video modality. A posterior distribution over the gating parameters is used to model their uncertainty given limited and variable disordered speech data. Experiments conducted on the UASpeech dysarthric speech corpus suggest the proposed BGNN AVSR system consistently outperforms state-of-the-art deep neural network (DNN) baseline ASR and AVSR systems by 4.5% and 4.7% absolute (14.9% and 15.5% relative) in word error rate.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — disordered speech recognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

🐣 Hot Topic Early Bird — visual feature

Authors

Shansong Liu , Shoukang Hu , Yi Wang , Jianwei Yu , Rongfeng Su , Xunying Liu , Helen Meng

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speech Recognition Machine Learning > Bayesian & Probabilistic > Bayesian Learning Healthcare & Medicine > Clinical > Medical AI

Keywords

bayesian inference automatic speech recognition phoneme recognition visual speech recognition visual feature audio-visual speech recognition disordered speech recognition neural network bayesian gated neural network

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019