A ResNet-50-Based Convolutional Neural Network Model for Language ID Identification from Speech Recordings

Giuseppe G. A. Celano

2021 NAACL NAACL 2021

A ResNet-50-Based Convolutional Neural Network Model for Language ID Identification from Speech Recordings

Abstract

AbstractThis paper describes the model built for the SIGTYP 2021 Shared Task aimed at identifying 18 typologically different languages from speech recordings. Mel-frequency cepstral coefficients derived from audio files are transformed into spectrograms, which are then fed into a ResNet-50-based CNN architecture. The final model achieved validation and test accuracies of 0.73 and 0.53, respectively.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Giuseppe G. A. Celano

Topics

Machine Learning > Core Methods > Classification Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speech Recognition Machine Learning > Learning Types > Deep Learning Speech & Audio > Recognition > Language Recognition

Keywords

language identification convolutional neural network mel-frequency cepstral coefficient

Download PDF

Related papers

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs 2021

Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks 2021

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction 2021

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing 2021

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers 2021