2017
INTERSPEECH
INTERSPEECH 2017
Improving Prediction of Speech Activity Using Multi-Participant Respiratory State
Abstract
One consequence of situated face-to-face conversation is the co-observability of participants’ respiratory movements and sounds. We explore whether this information can be exploited in predicting incipient speech activity. Using a methodology called stochastic turn-taking modeling, we compare the performance of a model trained on speech activity alone to one additionally trained on static and dynamic lung volume features. The methodology permits automatic discovery of temporal dependencies across participants and feature types. Our experiments show that respiratory information substantially lowers cross-entropy rates, and that this generalizes to unseen data.
🌉
Interdisciplinary Bridge
— Data Science & Analytics and Machine Learning
🧭
Keyword Pioneer
— speech activity prediction
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio