Speech Audio Super-Resolution for Speech Recognition

Xinyu Li; Venkata Chebiyyam; Katrin Kirchhoff

2019 INTERSPEECH INTERSPEECH 2019

Speech Audio Super-Resolution for Speech Recognition

Abstract

Automatic bandwidth extension (restoring high-frequency information from low sample rate audio) has a number of applications in speech processing. We introduce an end-to-end deep learning based system for speech bandwidth extension for use in a downstream automatic speech recognition (ASR) system. Specifically we propose a conditional generative adversarial network enriched with ASR-specific loss functions designed to upsample the speech audio while maintaining good ASR performance. Evaluations on the speech commands dataset and the LibriSpeech corpus show that our approach outperforms a number of traditional bandwidth extension methods with respect to word error rate.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — audio super-resolution

🐣 Hot Topic Early Bird — word error rate

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Xinyu Li , Venkata Chebiyyam , Katrin Kirchhoff

Topics

Deep Learning > Models > Generative Models Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Synthesis > Speech Enhancement

Keywords

automatic speech recognition speech enhancement generative adversarial network word error rate speech bandwidth extension bandwidth extension audio super-resolution speech super-resolution

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019