2019 INTERSPEECH INTERSPEECH 2019

A Convolutional Neural Network with Non-Local Module for Speech Enhancement

Abstract

Convolution neural networks (CNNs) are achieving increasing attention for the speech enhancement task recently. However, the convolutional operations only process a local neighborhood (several nearest neighboring neurons) at a time across either space or time direction. The long-range dependencies can only be captured when the convolutional operations are applied recursively, but the problems of computationally inefficient and optimization difficulties are introduced. Inspired by the recent impressive performance of the non-local module in many computer vision tasks, we propose a convolutional neural network with non-local module for speech enhancement in this paper. The non-local operations are capable of capturing the global information in the frequency domain through passing information between distant time-frequency units. The non-local operations are able to set the dimension of the input as an arbitrary value, which results in the easy integration with our proposed network framework. Experimental results demonstrate that the proposed method not only improves the computational efficiency significantly but also outperforms the competing methods in terms of objective speech intelligibility and quality metrics.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🐣 Hot Topic Early Bird — frequency domain
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio