2023 INTERSPEECH INTERSPEECH 2023

Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation

Abstract

Despite the recent success of all-neural beamforming approaches for speech separation, deploying them onto low-powered devices is difficult due to their demanding computational requirements. To address this issue, we present a lightweight on-device Mel-subband neural beamformer for in-car multi-zone speech separation and introduce several effective methods to boost its performance. First, we propose a global full-band spectral and spatial embedding to assist the separation for each Mel-subband. Second, an explicit distortionless constraint is incorporated to control the non-linear distortion. Finally, teacher-student learning and quantization-aware training (QAT) are utilized to improve and accelerate the inference. Experimental results show that our proposed methods could achieve a significant word error rate (WER) reduction on real-recorded data and 0.39 real-time factor (RTF) on the device.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio