2024 INTERSPEECH INTERSPEECH 2024

Cross-Attention-Guided WaveNet for EEG-to-MEL Spectrogram Reconstruction

Abstract

This paper introduces an innovative approach that leverages a cross-attention-guided WaveNet combined with a coarse-to-fine granularity strategy to enhance the detailed reconstruction of Mel spectrograms from time-domain EEG signals. The proposed model utilizes WaveNet to sequentially reconstruct the envelope, 10-band Mel, 80-band Mel, and magnitude at progressively finer granularity levels. A cross-attention mechanism is introduced to explore correlations across modalities to address the modality gap. A combined loss function and Mixup augmentation technique are also employed to enhance the reconstruction performance. Notably, our approach achieves Pearson correlation values of 0.0651 ± 0.0153 for the validation set and 0.0413 ± 0.0169 for the heldout-subjects test set, securing the second position in the 2024 Auditory EEG Challenge. We also validated the contribution of each module through ablation experiments. The source code is available online.

🌉 Interdisciplinary Bridge — Deep Learning and Mathematics & Optimization
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio