A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge

Tomohiro Tanaka; Ryo Masumura; Takafumi Moriya; Takanobu Oba; Yushi Aono

2019 INTERSPEECH INTERSPEECH 2019

A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge

Abstract

This paper presents joint end-to-end and deep neural network-hidden Markov model (DNN-HMM) hybrid automatic speech recognition (ASR) systems that share network components. End-to-end ASR systems have been shown competitive performance compared with the DNN-HMM hybrid ASR systems in recent studies. These systems have different advantages, which are an estimation ability based on the totally optimized model of the end-to-end ASR system and a stable processing based on a frame-by-frame manner of the DNN-HMM hybrid ASR system. In our previous study, we proposed a method to utilize an end-to-end ASR system for rescoring hypotheses generated from a DNN-HMM hybrid ASR system. However, the conventional method cannot efficiently leverage the advantages since network components are independently modeled. In order to tackle this problem, we propose a joint end-to-end and DNN-HMM hybrid ASR systems that share the network to transfer knowledge of the systems. In the proposed method, end-to-end ASR systems utilize the information from an output of an internal layer in a DNN acoustic model in the DNN-HMM hybrid ASR system for enhancing the end-to-end ASR system. This enables us to efficiently leverage sharable information for improving the joint ASR system. Experimental results show that the proposed method outperforms the conventional method.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tomohiro Tanaka , Ryo Masumura , Takafumi Moriya , Takanobu Oba , Yushi Aono

Topics

Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

knowledge transfer acoustic model hidden markov model deep neural network hybrid system end-to-end speech recognition

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019