2023 INTERSPEECH INTERSPEECH 2023

Improved Contextualized Speech Representations for Tonal Analysis

Abstract

We propose fine-tuning wav2vec2.0 with a cross-entropy loss to classify tones in an utterance on a frame-by-frame basis. Our study demonstrates that this approach not only improves tone classification accuracy but also generates frame-level representations suitable for tonal analysis. By using these representations, we established that the third-tone-sandhi-rising tone in Mandarin speech differs from the lexical rising tone, and the third tone that doesn't undergo sandhi differs from the third tone that's not in a sandhi context. Our findings suggest that third-tone sandhi in Mandarin Chinese involves a continuous shift from Tone3 to Tone2, rather than a categorical change.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Natural Language Processing, Speech & Audio
🧭 Keyword Pioneer — tonal analysis