PSST! Prosodic Speech Segmentation with Transformers

Nathan Roll; Calbert Graham; Simon Todd

2023 CONLL CoNLL 2023

PSST! Prosodic Speech Segmentation with Transformers

Abstract

AbstractWe develop and probe a model for detecting the boundaries of prosodic chunks in untranscribed conversational English speech. The model is obtained by fine-tuning a Transformer-based speech-to-text (STT) model to integrate the identification of Intonation Unit (IU) boundaries with the STT task. The model shows robust performance, both on held-out data and on out-of-distribution data representing different dialects and transcription protocols. By evaluating the model on degraded speech data, and comparing it with alternatives, we establish that it relies heavily on lexico-syntactic information inferred from audio, and not solely on acoustic information typically understood to cue prosodic structure. We release our model as both a transcription tool and a baseline for further improvements in prosodic segmentation.

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — intonation unit

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio