Single-Channel Blind Direct-to-Reverberation Ratio Estimation Using Masking

Wolfgang Mack; Shuwen Deng; Emanuël A.P. Habets

2020 INTERSPEECH INTERSPEECH 2020

Single-Channel Blind Direct-to-Reverberation Ratio Estimation Using Masking

Abstract

Acoustic parameters, like the direct-to-reverberation ratio (DRR), can be used in audio processing algorithms to perform, e.g., dereverberation or in audio augmented reality. Often, the DRR is not available and has to be estimated blindly from recorded audio signals. State-of-the-art DRR estimation is achieved by deep neural networks (DNNs), which directly map a feature representation of the acquired signals to the DRR. Motivated by the equality of the signal-to-reverberation ratio and the (channel-based) DRR under certain conditions, we formulate single-channel DRR estimation as an extraction task of two signal components from the recorded audio. The DRR can be obtained by inserting the estimated signals in the definition of the DRR. The extraction is performed using time-frequency masks. The masks are estimated by a DNN trained end-to-end to minimize the mean-squared error between the estimated and the oracle DRR. We conduct experiments with different pre-processing and mask estimation schemes. The proposed method outperforms state-of-the-art single- and multi-channel methods on the ACE challenge data corpus.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — direct-to-reverberation ratio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wolfgang Mack , Shuwen Deng , Emanuël A.P. Habets

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Application Areas > Domain Adaptation Deep Learning > Models > Generative Models Speech & Audio > Analysis > Speech Enhancement

Keywords

speech dereverberation audio processing deep neural network signal extraction time-frequency mask mask estimation blind estimation direct-to-reverberation ratio

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020