Voice Activity Detection with Teacher-Student Domain Emulation

Jarrod Luckenbaugh; Samuel Abplanalp; Rachel Gonzalez; Daniel Fulford; David Gard; Carlos Busso

2021 INTERSPEECH INTERSPEECH 2021

Voice Activity Detection with Teacher-Student Domain Emulation

Abstract

Transfer learning is a promising approach to increase performance for many speech-based systems, including voice activity detection (VAD). Domain adaptation, a subfield of transfer learning, often improves model conditioning in the presence of a mismatch between train-test conditions. This study proposes a formulation for VAD based on the teacher-student training, where the teacher model, trained with clean data, transfers knowledge to the student model trained with a noisy, paired version of the corpus resembling the test conditions. The models leverage temporal information using recurrent neural networks (RNN), implemented with either bidirectional long short term memory (BLSTM) or the modern, continuous-state Hopfield network. We provide evidence that in-domain noise emulation for domain adaptation is viable under unconstrained audio channel conditions for VAD “in the wild.” Our application domain is in healthcare, where multimodal sensors, including microphones, from portable devices are used to automatically predict social isolation in patients affected by schizophrenia. We empirically show positive results for domain emulation when the training conditions are similar to the target domain. We also show that the Hopfield network outperforms our best BLSTM for VAD on real-world benchmarks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jarrod Luckenbaugh , Samuel Abplanalp , Rachel Gonzalez , Daniel Fulford , David Gard , Carlos Busso

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Domain Adaptation Deep Learning > Architectures > Neural Networks Machine Learning > Learning Types > Transfer Learning

Keywords

domain adaptation knowledge distillation hopfield network recurrent neural network bidirectional lstm teacher-student learning voice activity detection

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021