2023 INTERSPEECH INTERSPEECH 2023

MTANet: Multi-band Time-frequency Attention Network for Singing Melody Extraction from Polyphonic Music

Abstract

Singing melody extraction is an important task in music information retrieval. In this paper, we propose a multi-band time-frequency attention network (MTANet) for singing melody extraction from polyphonic music, which can generate the feature representation to characterize the fundamental frequency (F0) component. Moreover, a band partition scheme is proposed to fit the position distribution of the F0 component. Further, three hourglass sub-networks are used to capture various multi-band features. Then, a feature fusion module (FFM) is employed to fuse the multi-band features. Visualization analysis shows that the multi-band feature extraction branch can generate the feature representation for characterizing the F0 component effectively. Experimental results show that the MTANet outperforms the existing state-of-the-art methods, while keeping with fewer network parameters. Visualized results intuitively show that the MTANet can reduce the octave and melody detection errors.

🧭 Keyword Pioneer — singing melody extraction
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio
🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning