2019
INTERSPEECH
INTERSPEECH 2019
Speech Audio Super-Resolution for Speech Recognition
Abstract
Automatic bandwidth extension (restoring high-frequency information from low sample rate audio) has a number of applications in speech processing. We introduce an end-to-end deep learning based system for speech bandwidth extension for use in a downstream automatic speech recognition (ASR) system. Specifically we propose a conditional generative adversarial network enriched with ASR-specific loss functions designed to upsample the speech audio while maintaining good ASR performance. Evaluations on the speech commands dataset and the LibriSpeech corpus show that our approach outperforms a number of traditional bandwidth extension methods with respect to word error rate.
🌉
Interdisciplinary Bridge
— Deep Learning and Speech & Audio
🧭
Keyword Pioneer
— audio super-resolution
🐣
Hot Topic Early Bird
— word error rate
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio