A Thousand Words are Worth More Than One Recording:Word-EmbeddingBased Speaker Change Detection

Or Haim Anidjar; Itshak Lapidot; Chen Hajaj; Amit Dvir

2021 INTERSPEECH INTERSPEECH 2021

A Thousand Words are Worth More Than One Recording:Word-EmbeddingBased Speaker Change Detection

Abstract

Speaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. This task is essential for many applications, such as automatic voice transcription or Speaker Diarization (SD). This paper focuses on the essential task of audio segmentation and suggests a word-embedding-based solution for the SCD problem. Moreover, we show how to use our approach in order to outperform voice-based solutions for the SD problem. We empirically show that our method can accurately identify the speaker-turns in an audio-recording with 82.12% and 89.02% success in the Recall and F1-score measures.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Or Haim Anidjar , Itshak Lapidot , Chen Hajaj , Amit Dvir

Topics

Machine Learning > Core Methods > Embedding Learning

Keywords

word embedding speaker change detection audio segmentation

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021