ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations

Zijie Lin; Tianyu He; Siqi Cai; Haizhou Li

2024 INTERSPEECH INTERSPEECH 2024

ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations

Abstract

Recent studies have demonstrated the feasibility of localizing an attended sound source from electroencephalography (EEG) signals in a cocktail party scenario. This is referred to as EEG-enabled Auditory Spatial Attention Detection (ASAD). Despite the promise, there is a lack of ASAD datasets. Most existing ASAD datasets are recorded from two speaking locations. To bridge this gap, we introduce a new Auditory Spatial Attention (ASA) dataset, featuring multiple speaking locations of sound sources. The new dataset is designed to challenge and refine deep neural network solutions in real-world applications. Furthermore, we build a channel attention convolutional neural network (CA-CNN) as a reference model for ASA, that serves as a competitive benchmark for future studies.

🧭 Keyword Pioneer — auditory spatial attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Healthcare & Medicine and Speech & Audio

Authors

Zijie Lin , Tianyu He , Siqi Cai , Haizhou Li

Topics

Deep Learning > Architectures > Neural Networks Healthcare & Medicine > Research > Biosignal Processing Speech & Audio > Analysis > Speech Analysis

Keywords

channel attention sound localization convolutional neural network auditory spatial attention

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024