A Phase-Based Time-Frequency Masking for Multi-Channel Speech Enhancement in Domestic Environments

Alessio Brutti; Antigoni Tsiami; Athanasios Katsamanis; Petros Maragos

2016 INTERSPEECH INTERSPEECH 2016

A Phase-Based Time-Frequency Masking for Multi-Channel Speech Enhancement in Domestic Environments

Abstract

This paper introduces a novel time-frequency masking approach for speech enhancement, based on the consistency of the phase of the cross-spectrum observed at multiple microphones. The proposed approach is derived from solutions commonly adopted in spatial source separation and can be used as a post-filter in traditional multi-channel speech enhancement schemes. Since it is not based on a modeling of the coherence of diffuse noise, the proposed method complements traditional post-filters implementations, targeting non diffuse/coherent sources. It is particularly effective in domestic scenarios where microphones in a given room capture interfering coherent sources active in adjacent rooms. An experimental analysis on the DIRHA-GRID corpus shows that the proposed method considerably improves the signal-to-interference-ratio and can be used on top of state-of-the-art multi-channel speech enhancement methods.

🚀 Conference Pioneer — INTERSPEECH 2016

🧭 Keyword Pioneer — time-frequency masking

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Natural Language Processing, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio