2021
NAACL
NAACL 2021
Language ID Prediction from Speech Using Self-Attentive Pooling
Abstract
AbstractThis memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. For many low-resource and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID systems. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results for the language identification task.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Speech & Audio
🐣
Hot Topic Early Bird
— multilingual processing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Self-Supervised Learning
Deep Learning > Architectures > Transformers
Speech & Audio > Recognition > Speech Recognition
Machine Learning > Learning Types > Deep Learning
Artificial Intelligence > Core AI > Language
Speech & Audio > Recognition > Language Recognition