2024 INTERSPEECH INTERSPEECH 2024

Measuring acoustic dissimilarity of hierarchical markers in task-oriented dialogue with MFCC-based dynamic time warping

Abstract

Joint activities (e.g. building a LEGO model) unfold in a hierarchy of subprojects. Navigating them implies horizontally elaborating on a subproject (placing one block) and vertically moving to a new subproject (next block). Interactants coordinate horizontal and vertical transitions with project markers (okay, yeah). We suggest that vertical vs. horizontal transitions are distinguished both lexically and acoustically. We predicted that acoustic features of identical markers used for different transitions (okay-vertical vs. okay-horizontal) would exhibit more dissimilarity than markers used for same transitions (okay-vertical vs. okay-vertical). We used MFCC-based dynamic time warping to measure dissimilarity between vocalisations and analysed them with a Bayesian regression model. We find that Vietnamese speakers use both lexical and acoustic cues to mark transitions, and paired same-horizontal markers are acoustically more similar than same-vertical and different-transition markers.

🧭 Keyword Pioneer — hierarchical marker
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio