2018 INTERSPEECH INTERSPEECH 2018

Automated Classification of Children’s Linguistic versus Non-Linguistic Vocalisations

Abstract

A key outstanding task for speech technology involves dealing with non-standard speakers, notably young children. Distinguishing children's linguistic from non-linguistic vocalisations is crucial for a number of applied and fundamental research goals and yet there are few systems available for such a classification. This paper investigates two large-scale frame-level acoustic feature sets (eGeMAPS and ComParE16) followed by a dynamic model (GRU-RNN) and two kinds of derived static feature sets on the segment level (functional-based and Bag of Audio Words) combined with a static model (SVM) and automatically learnt representations directly from original raw voice signals by using an end-to-end system, which are compared against a simple phonetically-inspired baseline. These are applied to a large database of children's vocalisations (total N = 6,298) drawn from daylong recordings gathered in Namibia, Bolivia and Vanuatu. All of the systems outperform the baseline, with the highest performance in the test set for GRU-RNN using ComParE16 features. We identify promising paths of further research, including the application of a finer-grained classification of children's vocalisations onto these data and the exploration of other feature systems.

🧭 Keyword Pioneer — vocalisation classification
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio