A Modified Algorithm for Multiple Input Spectrogram Inversion

Dongxiao Wang; Hirokazu Kameoka; Koichi Shinoda

2019 INTERSPEECH INTERSPEECH 2019

A Modified Algorithm for Multiple Input Spectrogram Inversion

Abstract

We propose a new algorithm to estimate the phase of speech signal in the mixture of audio sources under the assumption that the magnitude spectrum of each source is given. The previous method, multiple input spectrogram inversion algorithm (MISI), often performs poorly when the magnitude spectrograms estimated are not accurate. This may be because it imposes a strict constraint that the summation of source waveforms should be exactly the same as the mixture waveform. Our proposing algorithm employs a new objective function in which this constraint is relaxed. In this objective function, the difference between the summation of source waveforms and the mixture waveform is the target to be minimized. The performance of our method, modified MISI is evaluated on two different experimental settings. In both settings it improves the audio source separation performance compared to MISI.

🌉 Interdisciplinary Bridge — Mathematics & Optimization and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Dongxiao Wang , Hirokazu Kameoka , Koichi Shinoda

Topics

Speech & Audio > Synthesis > Speech Enhancement Mathematics & Optimization > Optimization > Continuous Optimization

Keywords

audio source separation objective function phase estimation spectrogram inversion

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019