Deep Convolutional Neural Network with Scalogram for Audio Scene Modeling

Hangting Chen; Pengyuan Zhang; Haichuan Bai; Qingsheng Yuan; Xiuguo Bao; Yonghong Yan

2018 INTERSPEECH INTERSPEECH 2018

Deep Convolutional Neural Network with Scalogram for Audio Scene Modeling

Abstract

Deep learning has improved the performance of acoustic scene classification recently. However, learning is usually based on short-time Fourier transform and hand-tailored filters. Learning directly from raw signals has remained a big challenge. In this paper, we proposed an approach to learning audio scene patterns from scalogram, which is extracted from raw signal with simple wavelet transforms. The experiments were conducted on DCASE2016 dataset. We compared scalogram with classical Mel energy, which showed that multi-scale feature led to an obvious accuracy increase. The convolutional neural network integrated with maximum-average downsampled scalogram achieved an accuracy of 90.5% in the evaluation step in DCASE2016.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🧭 Keyword Pioneer — audio scene modeling

🐣 Hot Topic Early Bird — wavelet transform

Authors

Hangting Chen , Pengyuan Zhang , Haichuan Bai , Qingsheng Yuan , Xiuguo Bao , Yonghong Yan

Topics

Machine Learning > Core Methods > Classification Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Scene Understanding Speech & Audio > Analysis > Speech Analysis Machine Learning > Learning Types > Deep Learning Deep Learning > Learning Types > Deep Learning

Keywords

wavelet transform convolutional neural network acoustic scene classification audio scene modeling

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018