2020 INTERSPEECH INTERSPEECH 2020

Deep Template Matching for Small-Footprint and Configurable Keyword Spotting

Abstract

Keyword spotting (KWS) is a very important technique for human–machine interaction to detect a trigger phrase and voice commands. In practice, a popular demand for KWS is to conveniently define the keywords by consumers or device vendors. In this paper, we propose a novel template matching approach for KWS based on end-to-end deep learning method, which utilizes an attention mechanism to match the input voice to the keyword templates in high-level feature space. The proposed approach only requires very limited voice samples (at least only one sample) to register a new keyword without any retraining. We conduct experiments on the publicly available Google speech commands dataset. The experimental results demonstrate that our method outperforms baseline methods while allowing for a flexible configuration.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio