2023 INTERSPEECH INTERSPEECH 2023

Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions

Abstract

The majority of online Voice Activity Detection (VAD) models employ a Recurrent Neural Network (RNN) component to capture long context which helps to improve noise-robustness. These RNN components are static models which do not make efficient use of the model's predictions from previous frames. In this work, we introduce a new Dynamic Encoder RNN (DE-RNN) that encodes the target speech dynamically to facilitate distinguishing of target speech from noise. Experiments on different established baseline architectures by modifying their RNN component by the addition of DE-RNN, show improvement in both background noise and secondary competing speaker noise scenarios. We used publicly available datasets for experiments.

🧭 Keyword Pioneer — dynamic encoder
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning