2023 INTERSPEECH INTERSPEECH 2023

Aligning Speech Enhancement for Improving Downstream Classification Performance

Abstract

Speech-based classification models in the cloud are gaining large-scale adoption. In many applications where post-deployment background noise conditions mismatch those used during model training, fine-tuning the original model on local data would likely improve performance. However, this is not always possible as the local user may not be authorized to modify the cloud-based model or the local user may be unable to share the data and corresponding labels required for fine-tuning. In this paper, we propose a denoiser stored locally on edge devices with an application-specific training scheme. It learns a custom speech enhancement scheme that aligns the local denoiser with the downstream model, without requiring access to the cloud-based weights. We evaluate the denoiser with a common classification task - keyword spotting - and demonstrate using two different architectures that the proposed scheme outperforms common speech enhancement models for different types of background noise.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio