2020 INTERSPEECH INTERSPEECH 2020

Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder

Abstract

This paper proposes a semi-supervised method for enhancing and suppressing self-produced speech, using a variational autoencoder (VAE) to jointly model self-produced speech recorded with air- and body-conductive microphones. In speech enhancement and suppression for self-produced speech, body-conducted signals can be used as an acoustical clue since they are robust against external noise and include self-produced speech predominantly. We have previously developed a semi-supervised method taking an improved source modeling approach called the joint source modeling, which can capture a nonlinear correspondence of air- and body-conducted signals using non-negative matrix factorization (NMF). This allows enhanced and suppressed air-conducted self-produced speech to be prevented from contaminating by the characteristics of body-conducted signals. However, our previous method employs a rank-1 spatial model, which is effective but difficult to consider in more practical situations. Furthermore, joint source modeling depends on the representation capability of NMF. As a result, enhancement and suppression performances are limited. To overcome these limitations, this paper employs a full-rank spatial model and proposes a joint source modeling of air- and body-conducted signals using a VAE, which has shown to represent source signals more accurately than NMF. Experimental results revealed that the proposed method outperformed baseline methods.

🧭 Keyword Pioneer — joint source modeling
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio