2021 INTERSPEECH INTERSPEECH 2021

Conversion of Airborne to Bone-Conducted Speech with Deep Neural Networks

Abstract

It is a common experience of most speakers that the playback of oneโ€™s own voice sounds strange. This can be mainly attributed to the missing bone-conducted speech signal that is not present in the playback signal. It was also shown that some phonemes have a high bone-conducted relative to air-conducted sound transmission, which means that the bone-conduction filter is phone-dependent. To achieve such a phone-dependent modeling we train different speaker dependent and speaker adaptive speech conversion systems using airborne and bone-conducted speech data from 8 speakers (5 male, 3 female), which allow for the conversion of airborne speech to bone-conducted speech. The systems are based on Long Short-Term Memory (LSTM) deep neural networks, where the speaker adaptive versions with speaker embedding can be used without bone-conduction signals from the target speaker. Additionally we also used models that apply a global filtering. The different models are then evaluated by an objective error metric and a subjective listening experiment, which show that the LSTM based models outperform the global filters.

๐ŸŒ‰ Interdisciplinary Bridge โ€” Artificial Intelligence and Deep Learning and Machine Learning
๐Ÿ Cross-Pollinator โ€” Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio