2022 INTERSPEECH INTERSPEECH 2022

ResectNet: An Efficient Architecture for Voice Activity Detection on Mobile Devices

Abstract

We present ResectNet, a RESource Efficient and CompacT Convolutional Recurrent Neural Network architecture for Voice Activity Detection (VAD) on mobile devices, which achieves state-of-the-art performance with less than 12k parameters. ResectNet operates on raw audio signals and consists of sinc convolutions, depthwise convolutions, grouped pointwise convolutions, frequency shift module and a gated recurrent unit. We propose a simple width-multiplier hyperparameter, which allows scaling ResectNet for the desired trade-off between efficiency and performance. We present a detailed ablation study on resource and performance trade-offs on the VAD task.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio