2022 INTERSPEECH INTERSPEECH 2022

Overlapped Speech Detection in Broadcast Streams Using X-vectors

Abstract

A new approach to overlapped speech detection (OSD) is introduced in this work. It is designed for real-time processing of streamed data and utilizes x-vectors as its input features. It thus allows us to reduce computational demands within the entire streaming data processing chain, where the same x-vectors can also be used for the related task of speaker diarization. Within our method, the x-vectors are extracted using a feed-forward sequential memory network (FSMN) and then fed into a simple neural classifier (speech or cross-talk), whose output is smoothed by a decoder based on weighted finite-state transducers (WFSTs). The evaluation is done on a Czech/Slovak broadcast dataset (we make this data public) and on the AMI meeting corpus. Our online method yields a solid performance while operating with a 2-second latency.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio