2023 INTERSPEECH INTERSPEECH 2023

FTA-net: A Frequency and Time Attention Network for Speech Depression Detection

Abstract

Depression is one of the most common mental diseases nowadays, which seriously affects the health of individuals. Some researchers have shown an association between the level of depression and speech features in individuals, so a lot of automatic speech-based depression detection systems have been proposed. A number of studies utilized convolutional neural network (CNN) to realize the speech depression detection. However, most of these studies did not take into account that different frequencies and time steps in the speech spectrum features contribute unequally to the detection of depression. In order to extract more significant and distinctive features, this paper proposes an effective frequency-time attention (FTA) module for CNN, which is based on squeeze and excitation operations and can emphasize the time steps and frequencies associated with depression. Experimental results based on the AVEC 2013 and AVEC 2014 benchmarks demonstrate the effectiveness of our proposed method.

🌉 Interdisciplinary Bridge — Deep Learning and Healthcare & Medicine and Machine Learning and Speech & Audio
🧭 Keyword Pioneer — speech depression
🐣 Hot Topic Early Bird — frequency analysis
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio