A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting

Ye Bai; Jiangyan Yi; Jianhua Tao; Zhengqi Wen; Zhengkun Tian; Chenghao Zhao; Cunhang Fan

2019 INTERSPEECH INTERSPEECH 2019

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting

Abstract

Keyword spotting requires a small memory footprint to run on mobile devices. However, previous works still use several hundred thousand parameters to achieve good performance. To address this issue, we propose a time delay neural network with shared weight self-attention for small-footprint keyword spotting. By sharing weights, the parameters of self-attention are reduced but without performance reduction. The publicly available Google Speech Commands dataset is used to evaluate the models. The number of parameters (12K) of our model is 1/20 of state-of-the-art ResNet model (239K). The proposed model achieves an error rate of 4.19% , which is comparable to the ResNet model.

🐣 Hot Topic Early Bird — parameter efficiency

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ye Bai , Jiangyan Yi , Jianhua Tao , Zhengqi Wen , Zhengkun Tian , Chenghao Zhao , Cunhang Fan

Topics

Speech & Audio > Recognition > Speech Recognition

Keywords

model compression self-attention mechanism keyword spotting parameter efficiency time delay neural network

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019