2023
INTERSPEECH
INTERSPEECH 2023
Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions
Abstract
The majority of online Voice Activity Detection (VAD) models employ a Recurrent Neural Network (RNN) component to capture long context which helps to improve noise-robustness. These RNN components are static models which do not make efficient use of the model's predictions from previous frames. In this work, we introduce a new Dynamic Encoder RNN (DE-RNN) that encodes the target speech dynamically to facilitate distinguishing of target speech from noise. Experiments on different established baseline architectures by modifying their RNN component by the addition of DE-RNN, show improvement in both background noise and secondary competing speaker noise scenarios. We used publicly available datasets for experiments.
🧭
Keyword Pioneer
— dynamic encoder
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning