Multi-Level Adaptive Speech Activity Detector for Speech in Naturalistic Environments

Bidisha Sharma; Rohan Kumar Das; Haizhou Li

2019 INTERSPEECH INTERSPEECH 2019

Multi-Level Adaptive Speech Activity Detector for Speech in Naturalistic Environments

Abstract

Speech activity detection (SAD) is a part of many speech processing applications. The traditional SAD approaches use signal energy as the evidence to identify the speech regions. However, such methods perform poorly under uncontrolled environments. In this work, we propose a novel SAD approach using a multi-level decision with signal knowledge in an adaptive manner. The multi-level evidence considered are modulation spectrum and smoothed Hilbert envelope of linear prediction (LP) residual. Modulation spectrum has compelling parallels to the dynamics of speech production and captures information only for the speech component. Contrarily, Hilbert envelope of LP residual captures excitation source aspect of speech. Under uncontrolled scenario, these evidence are found to be robust towards the signal distortions and thus expected to work well. In view of different levels of interference present in the signal, we propose to use a quality factor to control the speech/non-speech decision in an adaptive manner. We refer this method as multi-level adaptive SAD and evaluate on Fearless Steps corpus that is collected during Apollo-11 Mission in naturalistic environments. We achieve a detection cost function of 7.35% with the proposed multi-level adaptive SAD on the evaluation set of Fearless Steps 2019 challenge corpus.

🧭 Keyword Pioneer — adaptive detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Security & Privacy, Speech & Audio

Authors

Bidisha Sharma , Rohan Kumar Das , Haizhou Li

Topics

Robotics > Capabilities > Perception

Keywords

speech processing linear prediction voice activity detection linear prediction residual speech activity detection hilbert envelope adaptive detection modulation spectrum adaptive signal processing naturalistic environment

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019