2017 INTERSPEECH INTERSPEECH 2017

Canonical Correlation Analysis and Prediction of Perceived Rhythmic Prominences and Pitch Tones in Speech

Abstract

Speech prosody encodes information about language and communicative intent as well as speaker identity and state. Consequently, a host of speech technologies could benefit from increased understanding of prosodic phenomena and corresponding acoustics. A recently developed comprehensive prosodic transcription system called RaP (Rhythm-and-Pitch) annotates both perceived rhythmic prominences and pitch tones in speech. Using RaP-annotated speech corpora, the present work analyzes relationships between perceived prosodic events and acoustic features including syllable duration and novel measures of intensity and fundamental frequency. Canonical Correlation Analysis (CCA) reveals two dominant prosodic dimensions relating the acoustic features and RaP annotations. The first captures perceived prosodic emphasis of syllables indicated by strong metrical beats and significant pitch variability (i.e. presence of either high or low pitch tones). Acoustically, this dimension is described most by syllable duration followed by the mean intensity and fundamental frequency measures. The second CCA dimension then primarily discriminates pitch tone level (high versus low), indicated mainly by the mean fundamental frequency measure. Finally, within a leave-one-out cross-validation framework, RaP prosodic events are well-predicted from acoustic features (AUC between 0.78 and 0.84). Future work will exploit automated RaP labelling in contexts ranging from language learning to neurological disorder recognition.

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning
🧭 Keyword Pioneer — rhythmic prominence
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Security & Privacy, Speech & Audio