Edged based audio-visual speech enhancement demonstrator

Song Chen; Mandar Gogate; Kia Dashtipour; Jasper Kirton-Wingate; Adeel Hussain; Faiyaz Doctor; Tughrul Arslan; Amir Hussain

2024 INTERSPEECH INTERSPEECH 2024

Edged based audio-visual speech enhancement demonstrator

Abstract

Difficulty understanding speech in noisy environments presents a significant challenge for individuals with hearing loss and is a primary factor contributing to non-adherence to hearing aid use. Recent technological advancements integrating artificial intelligence, machine learning, and smartphone technology hold promise in advancing and customizing hearing healthcare. A proposed solution is a portable hearing assistive system designed for speech enhancement in noisy settings. We anticipate that this system will enhance the auditory experience of hearing aid users. The system leverages a mobile phone’s camera, microphone, and speaker, ensuring ease of portability. Raw video and audio data are stored locally on the phone and processed by the device’s processor alongside an audio-visual speech enhancement algorithm. This algorithm is capable of identifying voice signals and lip movements using a lightweight deep neural network model, thereby optimizing memory efficiency required for real-time processing.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Song Chen , Mandar Gogate , Kia Dashtipour , Jasper Kirton-Wingate , Adeel Hussain , Faiyaz Doctor , Tughrul Arslan , Amir Hussain

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Neural Networks

Keywords

speech enhancement real-time processing deep neural network lip movement

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024