2024 INTERSPEECH INTERSPEECH 2024

Multi-label Bird Species Classification from Field Recordings using Mel_Graph-GCN Framework

Abstract

This paper proposes a novel approach called the Mel_Graph-GCN framework, which utilizes graph convolutional neural networks to identify multiple bird species from field recordings. The process involves creating a graph from the Mel-spectrogram of the audio file using a trained deep convolutional neural network (deep CNN), and employing SpecAugment to generate additional Mel-spectrograms for enhanced training of the deep CNN. Subsequently, the graph is fed to a GCN for classification. The algorithm's performance is evaluated using the Xeno-canto bird sound database and compared with state-of-the-art models, demonstrating superior performance with a macro F1 score of 0.85.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — field recording
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio