2023 INTERSPEECH INTERSPEECH 2023

Vowel Normalisation in Latent Space for Sociolinguistics

Abstract

To study variations in vowel sounds between different sociolinguistic groups, sounds must be normalized to minimize variations caused by physical factors. The Lobanov method, for example, standardizes formant distributions by speaker. Since formants are often difficult to measure, and offer only a partial description of sounds, a robust and reproducible normalisation method based on the whole spectrum would be useful. One candidate is speaker-level standardization in the latent space of a variational auto-encoder, trained on a large sample of vowel spectra. We show that whole spectrum transformations induced by latent normalisation shift formants similarly to direct formant normalisation. We also show that formant-based normalisation procedures can be used to induce whole-spectrum transformations via latent space.

🌉 Interdisciplinary Bridge — Deep Learning and Interdisciplinary and Speech & Audio
🧭 Keyword Pioneer — vowel normalisation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors