A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

Jean-Marc Valin; Umut Isik; Neerad Phansalkar; Ritwik Giri; Karim Helwani; Arvindh Krishnaswamy

2020 INTERSPEECH INTERSPEECH 2020

A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

Abstract

Over the past few years, speech enhancement methods based on deep learning have greatly surpassed traditional methods based on spectral subtraction and spectral estimation. Many of these new techniques operate directly in the the short-time Fourier transform (STFT) domain, resulting in a high computational complexity. In this work, we propose PercepNet, an efficient approach that relies on human perception of speech by focusing on the spectral envelope and on the periodicity of the speech. We demonstrate high-quality, real-time enhancement of fullband (48 kHz) speech with less than 5% of a CPU core.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🐣 Hot Topic Early Bird — real-time processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jean-Marc Valin , Umut Isik , Neerad Phansalkar , Ritwik Giri , Karim Helwani , Arvindh Krishnaswamy

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Model Architecture Speech & Audio > Synthesis > Speech Enhancement

Keywords

computational complexity speech enhancement real-time processing perceptual quality spectral envelope perceptual evaluation fullband speech

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020