Reducing Speech Distortion and Artifacts for Speech Enhancement by Loss Function

Haixin Guan; Wei Dai; Guangyong Wang; Xiaobin Tan; Peng Li; Jiaen Liang

2024 INTERSPEECH INTERSPEECH 2024

Reducing Speech Distortion and Artifacts for Speech Enhancement by Loss Function

Abstract

Deep learning-based speech enhancement has made significant strides. However, challenges such as speech distortion and artifacts persist. These issues can diminish perceived auditory quality and the accuracy of speech recognition systems, particularly when employing lightweight models. Therefore, this paper investigates the underlying principles governing the formation of speech distortion and artifacts, and introduces a novel combined loss function that integrates Voice Activity Detection (VAD) information and speech continuity to solve the problem. Additionally, a new training strategy is designed based on the proposed loss function to address the difficulty of training this combined loss on extremely small models. Experiments validate the effectiveness of our approach on the DNS2020 dataset and real meeting data in enhancing both subjective and objective speech metrics, as well as Automatic Speech Recognition (ASR) performance.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — artifact reduction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Haixin Guan , Wei Dai , Guangyong Wang , Xiaobin Tan , Peng Li , Jiaen Liang

Topics

Machine Learning > Optimization & Theory > Loss Functions Speech & Audio > Synthesis > Speech Enhancement

Keywords

speech enhancement loss function voice activity detection speech distortion artifact reduction

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024