2023
INTERSPEECH
INTERSPEECH 2023
Joint Time and Frequency Transformer for Chinese Opera Classification
Abstract
Transformer has recently gained more attention and is widely used in audio tasks. Most tasks compute attention directly over the entire time-frequency space or only in the temporal. This paper presents a joint time and frequency model for Chinese opera classification. A shallow convolutional block is used to get localized low-level semantic features and reduce the feature map size. Moreover, the criss-cross attention and the factorised self-attention are employed in the model to extract the time and frequency space representation. The experiment results demonstrate that the proposed model achieves state-of-the-art performance on a large Chinese opera dataset with fewer model parameters.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning
🧭
Keyword Pioneer
— joint time and frequency
🐝
Cross-Pollinator
— Computer Vision, Deep Learning, Machine Learning, Speech & Audio