Knowledge Distillation for Tiny Speech Enhancement with Latent Feature Augmentation

Behnam Gholami; Mostafa El-Khamy; KeeBong Song

2024 INTERSPEECH INTERSPEECH 2024

Knowledge Distillation for Tiny Speech Enhancement with Latent Feature Augmentation

Abstract

Recent deep neural network (DNN) models have achieved high performance in speech enhancement. However, deploying such complex models in resource-constrained environments can be challenging without significant performance degradation. Knowledge distillation (KD), a technique where a smaller (student) model is trained to mimic the behavior of a larger, more complex (teacher) model, has emerged as a popular approach to address this challenge. In this paper, we propose a feature-augmentation based knowledge distillation method for speech enhancement, leveraging the information stored in the intermediate latent features of the DNN teacher model to train a smaller, more efficient student model. Experimental results on VoiceBank+DEMAND dataset demonstrate the effectiveness of the proposed knowledge distillation method for speech enhancement.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — latent feature augmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Behnam Gholami , Mostafa El-Khamy , KeeBong Song

Topics

Machine Learning > Application Areas > Knowledge Distillation Speech & Audio > Synthesis > Speech Enhancement

Keywords

model compression knowledge distillation speech enhancement deep neural network feature augmentation latent feature latent feature augmentation resource-constrained environment

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024