Combined Speaker Clustering and Role Recognition in Conversational Speech

Nikolaos Flemotomos; Pavlos Papadopoulos; James Gibson; Shrikanth Narayanan

2018 INTERSPEECH INTERSPEECH 2018

Combined Speaker Clustering and Role Recognition in Conversational Speech

Abstract

Speaker Role Recognition (SRR) is usually addressed either as an independent classification task, or as a subsequent step after a speaker clustering module. However, the first approach does not take speaker-specific variabilities into account, while the second one results in error propagation. In this work we propose the integration of an audio-based speaker clustering algorithm with a language-aided role recognizer into a meta-classifier which takes both modalities into account. That way, we can treat separately any speaker-specific and role-specific characteristics before combining the relevant information together. The method is evaluated on two corpora of different conditions with interactions between a clinician and a patient and it is shown that it yields superior results for the SRR task.

🧭 Keyword Pioneer — role recognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nikolaos Flemotomos , Pavlos Papadopoulos , James Gibson , Shrikanth Narayanan

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Clustering Machine Learning > Application Areas > Domain Adaptation

Keywords

multimodal learning error propagation speaker diarization speaker clustering role recognition

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018