Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions

Prithvi R.R. Gudepu; Jayesh M. Koroth; Kamini Sabu; Mahaboob Ali Basha Shaik

2023 INTERSPEECH INTERSPEECH 2023

Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions

Abstract

The majority of online Voice Activity Detection (VAD) models employ a Recurrent Neural Network (RNN) component to capture long context which helps to improve noise-robustness. These RNN components are static models which do not make efficient use of the model's predictions from previous frames. In this work, we introduce a new Dynamic Encoder RNN (DE-RNN) that encodes the target speech dynamically to facilitate distinguishing of target speech from noise. Experiments on different established baseline architectures by modifying their RNN component by the addition of DE-RNN, show improvement in both background noise and secondary competing speaker noise scenarios. We used publicly available datasets for experiments.

🧭 Keyword Pioneer — dynamic encoder

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning