Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions

Vikram Ramanarayanan; Patrick L. Lange; Keelan Evanini; Hillary R. Molloy; David Suendermann-Oeft

2017 INTERSPEECH INTERSPEECH 2017

Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions

Abstract

We present a spoken dialog-based framework for the computer-assisted language learning (CALL) of conversational English. In particular, we leveraged the open-source HALEF dialog framework to develop a job interview conversational application. We then used crowdsourcing to collect multiple interactions with the system from non-native English speakers. We analyzed human-rated scores of the recorded dialog data on three different scoring dimensions critical to the delivery of conversational English — fluency, pronunciation and intonation/stress — and further examined the efficacy of automatically-extracted, hand-curated speech features in predicting each of these sub-scores. Machine learning experiments showed that trained scoring models generally perform at par with the human inter-rater agreement baseline in predicting human-rated scores of conversational proficiency.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — computer-assisted language learning

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Vikram Ramanarayanan , Patrick L. Lange , Keelan Evanini , Hillary R. Molloy , David Suendermann-Oeft

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Regression

Keywords

pronunciation assessment speech scoring computer-assisted language learning spoken dialog system intonation analysis

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017