SE-Conformer: Time-Domain Speech Enhancement Using Conformer

Eesung Kim; Hyeji Seo

2021 INTERSPEECH INTERSPEECH 2021

SE-Conformer: Time-Domain Speech Enhancement Using Conformer

Abstract

Convolution-augmented transformer (conformer) has recently shown competitive results in speech-domain applications, such as automatic speech recognition, continuous speech separation, and sound event detection. Conformer can capture both the short and long-term temporal sequence information by attending to the whole sequence at once with multi-head self-attention and convolutional neural network. However, the effectiveness of conformer in speech enhancement has not been demonstrated. In this paper, we propose an end-to-end speech enhancement architecture (SE-Conformer), incorporating a convolutional encoder–decoder and conformer, designed to be directly applied to the time-domain signal. We performed evaluations on both the VoiceBank-DEMAND Corpus (VCTK) and Librispeech datasets in terms of objective speech quality metrics. The experimental results show that the proposed model outperforms other competitive baselines in speech enhancement performance.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Eesung Kim , Hyeji Seo

Topics

Deep Learning > Architectures > Transformers Speech & Audio > Synthesis > Speech Enhancement

Keywords

speech enhancement

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021