2024 INTERSPEECH INTERSPEECH 2024

Reducing Speech Distortion and Artifacts for Speech Enhancement by Loss Function

Abstract

Deep learning-based speech enhancement has made significant strides. However, challenges such as speech distortion and artifacts persist. These issues can diminish perceived auditory quality and the accuracy of speech recognition systems, particularly when employing lightweight models. Therefore, this paper investigates the underlying principles governing the formation of speech distortion and artifacts, and introduces a novel combined loss function that integrates Voice Activity Detection (VAD) information and speech continuity to solve the problem. Additionally, a new training strategy is designed based on the proposed loss function to address the difficulty of training this combined loss on extremely small models. Experiments validate the effectiveness of our approach on the DNS2020 dataset and real meeting data in enhancing both subjective and objective speech metrics, as well as Automatic Speech Recognition (ASR) performance.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🧭 Keyword Pioneer — artifact reduction
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio