2017
INTERSPEECH
INTERSPEECH 2017
LSTM Neural Network-Based Speaker Segmentation Using Acoustic and Language Modelling
Abstract
This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different Joint Factor Analysis (JFA) acoustic approaches: i-vectors and speaker factors. Both of them are compared with a baseline algorithm that uses cosine distance to detect speaker turn changes. LSTM neural networks with both linguistic and acoustic features have been able to produce a robust speaker segmentation. The experimental results show that our proposal clearly outperforms the baseline system.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning
🧭
Keyword Pioneer
— language modelling
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Machine Learning, Natural Language Processing, Speech & Audio