Binary Speech Features for Keyword Spotting Tasks

Alexandre Riviello; Jean-Pierre David

2019 INTERSPEECH INTERSPEECH 2019

Binary Speech Features for Keyword Spotting Tasks

Abstract

Keyword spotting is a classification task which aims to detect a specific set of spoken words. In general, this type of task runs on a power-constrained device such as a smartphone. One method to reduce the power consumption of a keyword spotting algorithm (typically a neural network) is to reduce the precision of the network weights and activations. In this paper, we propose a new representation of speech features which is more adapted to low-precision networks and compatible with binary/ternary neural networks. The new representation is based on the log-Mel spectrogram and models the variation of power over time. Tested on a ResNet, this representation produces results nearly as accurate as full-precision MFCCs, which are traditionally used in speech recognition applications.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — log-mel spectrogram

🐣 Hot Topic Early Bird — model compression

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Alexandre Riviello , Jean-Pierre David

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speech Recognition Machine Learning > Application Areas > Model Compression

Keywords

model compression speech recognition keyword spotting feature representation low-precision computing binary neural network log-mel spectrogram

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019