2024 INTERSPEECH INTERSPEECH 2024

Blind Zero-Shot Audio Restoration: A Variational Autoencoder Approach for Denoising and Inpainting

Abstract

We address the task of blind 'zero-shot' audio signal denoising and inpainting. In the blind zero-shot setting, only the corrupted audio signal is used for signal restoration (no other signals are available to train the model). For this challenging setting, we apply a recent variational autoencoder that can leverage advanced probabilistic variational optimization in addition to flexible data modeling enabled by deep neural networks (DNNs). The investigated approach uses a non-amortized encoder and truncated posteriors as variational distributions. This way, the posterior correlations can be approximated, and a theoretically grounded treatment of missing values is directly available. In benchmarks for denoising and inpainting and in comparison with other zero-shot approaches, we observe competitive performance. Our results suggest that combining high-quality probabilistic optimization with DNN optimization is a very promising strategy for challenging audio restoration tasks.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio