Atss-Net: Target Speaker Separation via Attention-Based Neural Network

Tingle Li; Qingjian Lin; Yuanyuan Bao; Ming Li

2020 INTERSPEECH INTERSPEECH 2020

Atss-Net: Target Speaker Separation via Attention-Based Neural Network

Abstract

Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. Experimental results show that our Atss-Net yields better performance than the VoiceFilter, although it only contains half of the parameters. Furthermore, our proposed model also demonstrates promising performance in speech enhancement.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — spectrogram processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tingle Li , Qingjian Lin , Yuanyuan Bao , Ming Li

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Model Architecture Speech & Audio > Analysis > Speech Enhancement

Keywords

attention mechanism speech enhancement convolutional neural network long short-term memory speaker separation spectrogram processing target speaker separation neural network

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020