2016 INTERSPEECH INTERSPEECH 2016

Automatic Classification of Lexical Stress in English and Arabic Languages Using Deep Learning

Abstract

Prosodic features are important for the intelligibility and proficiency of stress-timed languages such as English and Arabic. Producing the appropriate lexical stress is challenging for second language (L2) learners, in particular, those whose first language (L1) is a syllable-timed language such as Spanish, French, etc. In this paper we introduce a method for automatic classification of lexical stress to be integrated into computer-aided pronunciation learning (CAPL) tools for L2 learning. We trained two different deep learning architectures, the deep feedforward neural network (DNN) and the deep convolutional neural network (CNN) using a set of temporal and spectral features related to the intensity, duration, pitch and energies in different frequency bands. The system was applied on both English (kids and adult) and Arabic (adult) speech corpora collected from native speakers. Our method results in error rates of 9%, 7% and 18% when tested on the English children corpus, English adult corpus and Arabic adult corpus respectively.

πŸš€ Conference Pioneer β€” INTERSPEECH 2016
🧭 Keyword Pioneer β€” spectral feature
🐝 Cross-Pollinator β€” Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Speech & Audio
πŸŒ‰ Interdisciplinary Bridge β€” Deep Learning and Machine Learning and Speech & Audio