Temporal Transformer Networks for Acoustic Scene Classification

Teng Zhang; Kailai Zhang; Ji Wu

2018 INTERSPEECH INTERSPEECH 2018

Temporal Transformer Networks for Acoustic Scene Classification

Abstract

Neural networks have been proven to be powerful models for acoustic scene classification tasks, but are still limited by the lack of ability to be temporally invariant to the audio data. In this paper, a novel temporal transformer module is proposed to allow the temporal manipulation of data in neural networks. This module is composed of a Fourier transform layer for feature maps and a learnable feature reduction layer and can be inserted into existing convolutional neural network (CNN) and Long short-term memory (LSTM) models. Experiments on LITIS Rouen dataset and DCASE2016 dataset show that the proposed method leads to a significant improvement when compared with the existing neural networks. Our approach is able to perform significantly better than the state-of-the-art result on LITIS Rouen dataset, obtaining a relative reduction of 23.6% on classification error.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Transformers

🧭 Keyword Pioneer — temporal transformer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

🐣 Hot Topic Early Bird — fourier transform

Authors

Teng Zhang , Kailai Zhang , Ji Wu

Topics

Machine Learning > Core Methods > Classification Deep Learning > Architectures > Transformers Deep Learning > Techniques > Model Architecture Speech & Audio > Analysis > Speech Analysis

Keywords

fourier transform audio classification convolutional neural network long short-term memory temporal transformer feature reduction acoustic scene classification learnable feature reduction fourier transform layer

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018