2020 INTERSPEECH INTERSPEECH 2020

Atss-Net: Target Speaker Separation via Attention-Based Neural Network

Abstract

Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. Experimental results show that our Atss-Net yields better performance than the VoiceFilter, although it only contains half of the parameters. Furthermore, our proposed model also demonstrates promising performance in speech enhancement.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio
🧭 Keyword Pioneer — spectrogram processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio