Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features

Rakib Hyder; Shabnam Ghaffarzadegan; Zhe Feng; John H.L. Hansen; Taufiq Hasan

2017 INTERSPEECH INTERSPEECH 2017

Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features

Abstract

Enabling smart devices to infer about the environment using audio signals has been one of the several long-standing challenges in machine listening. The availability of public-domain datasets, e.g., Detection and Classification of Acoustic Scenes and Events (DCASE) 2016, enabled researchers to compare various algorithms on standard predefined tasks. Most of the current best performing individual acoustic scene classification systems utilize different spectrogram image based features with a Convolutional Neural Network (CNN) architecture. In this study, we first analyze the performance of a state-of-the-art CNN system for different auditory image and spectrogram features, including Mel-scaled, logarithmically scaled, linearly scaled filterbank spectrograms, and Stabilized Auditory Image (SAI) features. Next, we benchmark an MFCC based Gaussian Mixture Model (GMM) SuperVector (SV) system for acoustic scene classification. Finally, we utilize the activations from the final layer of the CNN to form a SuperVector (SV) and use them as feature vectors for a Probabilistic Linear Discriminative Analysis (PLDA) classifier. Experimental evaluation on the DCASE 2016 database demonstrates the effectiveness of the proposed CNN-SV approach compared to conventional CNNs with a fully connected softmax output layer. Score fusion of individual systems provides up to 7% relative improvement in overall accuracy compared to the CNN baseline system.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — super vector

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Rakib Hyder , Shabnam Ghaffarzadegan , Zhe Feng , John H.L. Hansen , Taufiq Hasan

Topics

Machine Learning > Core Methods > Classification Computer Vision > Processing > Image Segmentation

Keywords

convolutional neural network probabilistic linear discriminant analysis acoustic scene classification spectrogram feature super vector

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017