2023 INTERSPEECH INTERSPEECH 2023

Overlap Aware Continuous Speech Separation without Permutation Invariant Training

Abstract

Continuous speech separation (CSS) aims to separate a long-form signal with multiple partially overlapped utterances into a set of non-overlapped speech signals. While most existing CSS methods rely on the permutation invariant training (PIT) algorithm for training and inference, we argue that one may not need PIT at all to achieve promising CSS performance. In this paper, we propose a novel overlap aware CSS method, which explicitly identifies the non-overlapped segments in the long-form input to guide the separation of overlapped segments. We show that with the help of an external overlapping speech detection (OSD) model, an overlap-aware CSS model can be trained without PIT. In addition, an overlap-aware inference algorithm is proposed to greatly reduce the computational cost while preserving strong performance. Experiment results show that our proposed methods outperform the conventional stitching-based CSS approach, with over 1 dB signal-to-noise ratio (SNR) improvement.

🧭 Keyword Pioneer — overlap speech detection
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio
🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio