2021
INTERSPEECH
INTERSPEECH 2021
End-to-End Transformer-Based Open-Vocabulary Keyword Spotting with Location-Guided Local Attention
Abstract
Open-vocabulary keyword spotting (KWS) aims to detect arbitrary keywords from continuous speech, which allows users to define their personal keywords. In this paper, we propose a novel location guided end-to-end (E2E) keyword spotting system. Firstly, we predict endpoints of keyword in the entire speech based on attention mechanism. Secondly, we calculate the existence probability of keyword by fusing the located keyword speech segment and text with local attention. The results on Librispeech dataset and Google speech commands dataset show our proposed method significantly outperforms the baseline method and the latest small-footprint E2E KWS method.
🌉
Interdisciplinary Bridge
— Computer Vision and Deep Learning and Speech & Audio
🧭
Keyword Pioneer
— location-guided attention
🐣
Hot Topic Early Bird
— open vocabulary
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
Authors
Bo Wei
,
Meirong Yang
,
Tao Zhang
,
Xiao Tang
,
Xing Huang
,
Kyuhong Kim
,
Jaeyun Lee
,
Kiho Cho
,
Sung-Un Park