Indoor/Outdoor Audio Classification Using Foreground Speech Segmentation

Banriskhem K. Khonglah; K.T. Deepak; S.R. Mahadeva Prasanna

2017 INTERSPEECH INTERSPEECH 2017

Indoor/Outdoor Audio Classification Using Foreground Speech Segmentation

Abstract

The task of indoor/ outdoor audio classification using foreground speech segmentation is attempted in this work. Foreground speech segmentation is the use of features to segment between foreground speech and background interfering sources like noise. Initially, the foreground and background segments are obtained from foreground speech segmentation by using the normalized autocorrelation peak strength (NAPS) of the zero frequency filtered signal (ZFFS) as a feature. The background segments are then considered for determining whether a particular segment is an indoor or outdoor audio sample. The mel frequency cepstral coefficients are obtained from the background segments of both the indoor and outdoor audio samples and are used to train the Support Vector Machine (SVM) classifier. The use of foreground speech segmentation gives a promising performance for the indoor/ outdoor audio classification task.

🧭 Keyword Pioneer — foreground speech segmentation

🐣 Hot Topic Early Bird — audio classification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Banriskhem K. Khonglah , K.T. Deepak , S.R. Mahadeva Prasanna

Topics

Machine Learning > Core Methods > Classification

Keywords

speech processing audio classification support vector machine mel frequency cepstral coefficient foreground speech segmentation

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017