What Does an End-to-End Dialect Identification Model Learn About Non-Dialectal Information?

Shammur A. Chowdhury; Ahmed Ali; Suwon Shon; James Glass

2020 INTERSPEECH INTERSPEECH 2020

What Does an End-to-End Dialect Identification Model Learn About Non-Dialectal Information?

Abstract

An end-to-end dialect identification system generates the likelihood of each dialect, given a speech utterance. The performance relies on its capabilities to discriminate the acoustic properties between the different dialects, even though the input signal contains non-dialectal information such as speaker and channel. In this work, we study how non-dialectal information are encoded inside the end-to-end dialect identification model. We design several proxy tasks to understand the model’s ability to represent speech input for differentiating non-dialectal information — such as (a) gender and voice identity of speakers, (b) languages, (c) channel (recording and transmission) quality — and compare with dialectal information (i.e., predicting geographic region of the dialects). By analyzing non-dialectal representations from layers of an end-to-end Arabic dialect identification (ADI) model, we observe that the model retains gender and channel information throughout the network while learning a speaker-invariant representation. Our findings also suggest that the CNN layers of the end-to-end model mirror feature extractors capturing voice-specific information, while the fully-connected layers encode more dialectal information.

❓ The Questioner

🧭 Keyword Pioneer — channel information

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Machine Learning and Speech & Audio

Authors

Shammur A. Chowdhury , Ahmed Ali , Suwon Shon , James Glass

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Domain Adaptation Interdisciplinary > Linguistics > Computational Linguistics Speech & Audio > Analysis > Speech Analysis

Keywords

representation learning neural network interpretability speech analysis speaker identity dialect identification end-to-end model speaker invariance channel information speaker-invariant representation

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020