Reference-Based Speech Enhancement via Feature Alignment and Fusion Network

Huanjing Yue; Wenxin Duo; Xiulian Peng; Jingyu Yang

2022 AAAI AAAI 2022

Reference-Based Speech Enhancement via Feature Alignment and Fusion Network

Abstract

Abstract Speech enhancement aims at recovering a clean speech from a noisy input, which can be classified into single speech enhancement and personalized speech enhancement. Personalized speech enhancement usually utilizes the speaker identity extracted from the noisy speech itself (or a clean reference speech) as a global embedding to guide the enhancement process. Different from them, we observe that the speeches of the same speaker are correlated in terms of frame-level short-time Fourier Transform (STFT) spectrogram. Therefore, we propose reference-based speech enhancement via a feature alignment and fusion network (FAF-Net). Given a noisy speech and a clean reference speech spoken by the same speaker, we first propose a feature level alignment strategy to warp the clean reference with the noisy speech in frame level. Then, we fuse the reference feature with the noisy feature via a similarity-based fusion strategy. Finally, the fused features are skipped connected to the decoder, which generates the enhanced results. Experimental results demonstrate that the performance of the proposed FAF-Net is close to state-of-the-art speech enhancement methods on both DNS and Voice Bank+DEMAND datasets. Our code is available at https://github.com/HieDean/FAF-Net.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — personalized enhancement

🐣 Hot Topic Early Bird — signal processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Huanjing Yue , Wenxin Duo , Xiulian Peng , Jingyu Yang

Topics

Deep Learning > Architectures > Autoencoders Speech & Audio > Synthesis > Speech Enhancement Deep Learning > Learning Types > Deep Learning Speech & Audio > Processing > Speech Enhancement Deep Learning > Application Areas > Efficient Computing

Keywords

feature alignment speech enhancement speaker embedding signal processing short-time fourier transform speaker identity fusion network personalized enhancement clean reference

Download PDF

Related papers

Dynamic Spatial Propagation Network for Depth Completion 2022

FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition 2022

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding 2022

AnchorFace: Boosting TAR@FAR for Practical Face Recognition 2022

Parallel and High-Fidelity Text-to-Lip Generation 2022