Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling

Dung T. Tran; Marc Delcroix; Shigeki Karita; Michael Hentschel; Atsunori Ogawa; Tomohiro Nakatani

2017 INTERSPEECH INTERSPEECH 2017

Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling

Abstract

Recurrent neural networks (RNNs) with jump ahead connections have been used in the computer vision tasks. Still, they have not been investigated well for automatic speech recognition (ASR) tasks. In other words, unfolded RNN has been shown to be an effective model for acoustic modeling tasks. This paper investigates how to elaborate a sophisticated unfolded deep RNN architecture in which recurrent connections use a convolutional neural network (CNN) to model a short-term dependence between hidden states. In this study, our unfolded RNN architecture is a CNN that process a sequence of input features sequentially. Each time step, the CNN inputs a small block of the input features and the output of the hidden layer from the preceding block in order to compute the output of its hidden layer. In addition, by exploiting either one or multiple jump ahead connections between time steps, our network can learn long-term dependencies more effectively. We carried experiments on the CHiME 3 task showing the effectiveness of our proposed approach.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — jump ahead connection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Dung T. Tran , Marc Delcroix , Shigeki Karita , Michael Hentschel , Atsunori Ogawa , Tomohiro Nakatani

Topics

Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Model Architecture Speech & Audio > Recognition > Speech Recognition

Keywords

automatic speech recognition acoustic modeling convolutional neural network recurrent neural network jump ahead connection unfolded architecture

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017