Speech Boosting: Low-Latency Live Speech Enhancement for TWS Earbuds

Hanbin Bae; Pavel Andreev; Azat Saginbaev; Nicholas Babaev; WonJun Lee; Hosang Sung; Hoon-Young Cho

2024 INTERSPEECH INTERSPEECH 2024

Speech Boosting: Low-Latency Live Speech Enhancement for TWS Earbuds

Abstract

This paper introduces a speech enhancement solution tailored for true wireless stereo (TWS) earbuds on-device usage. The solution was specifically designed to support conversations in noisy environments, with active noise cancellation (ANC) activated. The primary challenges for speech enhancement models in this context arise from computational complexity that limits on-device usage and latency that must be less than 3 ms to preserve a live conversation. To address these issues, we evaluated several crucial design elements, including the network architecture and domain, design of loss functions, pruning method, and hardware-specific optimization. Consequently, we demonstrated substantial improvements in speech enhancement quality compared with that in baseline models, while simultaneously reducing the computational complexity and algorithmic latency.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — hardware-specific optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hanbin Bae , Pavel Andreev , Azat Saginbaev , Nicholas Babaev , WonJun Lee , Hosang Sung , Hoon-Young Cho

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Model Architecture

Keywords

computational complexity speech enhancement model pruning algorithmic latency hardware-specific optimization

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024