2021 INTERSPEECH INTERSPEECH 2021

Energy-Friendly Keyword Spotting System Using Add-Based Convolution

Abstract

Wake-up keyword of a keyword spotting (KWS) system represents brand name of a smart device. Performance of KWS is also crucial for modern speech based human-device interaction. An on-device KWS with both high accuracy and low power consumption is desired. We propose a KWS with add-based convolution layers, namely Add TC-ResNet. Add-based convolution paves a new way to reduce power consumption of KWS system, as addition is more energy efficient than multiplication at hardware level. On Google Speech Commands dataset V2, Add TC-ResNet achieves an accuracy of 97.1%, with 99% of multiplication operations are replaced by addition operations. The result is competitive to a state-of-the-art fully multiplication-based TC-ResNet KWS. We also investigate knowledge distillation and a mixed addition-multiplication design for the proposed KWS, which leads to further performance improvement.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio