The Application of Learnable STRF Kernels to the 2021 Fearless Steps Phase-03 SAD Challenge

Tyler Vuong; Yangyang Xia; Richard M. Stern

2021 INTERSPEECH INTERSPEECH 2021

The Application of Learnable STRF Kernels to the 2021 Fearless Steps Phase-03 SAD Challenge

Abstract

We describe a deep-learning-based system developed for the Fearless Steps Phase-03 Speech Activity Detection (SAD) challenge. The system includes both learnable spectro-temporal receptive fields (STRFs) and unconstrained 2-dimensional convolutional kernels in the first layer. Experiments show that the inclusion of learnable STRFs in the first layer increases the system’s robustness to additive noise. Additionally, we found that utilizing SpecAugment during training improves generalization on unseen data. By incorporating these enhancements and others our system achieved the best score in the official SAD challenge.

🧭 Keyword Pioneer — spectro-temporal receptive field

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Tyler Vuong , Yangyang Xia , Richard M. Stern

Topics

Interdisciplinary > Linguistics > Computational Linguistics

Keywords

speech processing deep learning speech activity detection spectro-temporal receptive field

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021