2016 INTERSPEECH INTERSPEECH 2016

Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks

Abstract

In this paper, we present a new i-vector based speaker adaptation method for automatic speech recognition with deep neural networks, focusing on in-vehicle scenarios. Our proposed method is, rather than augmenting i-vectors to acoustic feature vectors to form concatenated input vectors for adapting neural network acoustic model parameters, is to perform feature-space transformation with smaller transformation neural networks dedicated to acoustic feature vectors and i-vectors, respectively, followed by a layer of linear combination of the network outputs. This feature-space transformation is learned via semi-supervised learning without any parameter change in the original deep neural network acoustic model. Experimental results show that our proposed method achieves 18.3% relative improvement in terms of word error rate compared to the speaker independent performance, and verify that it has a potential to replace well-known feature-space Maximum Likelihood Linear Regression (fMLLR) in in-vehicle speech recognition with deep neural networks.

πŸš€ Conference Pioneer β€” INTERSPEECH 2016
πŸŒ‰ Interdisciplinary Bridge β€” Artificial Intelligence and Deep Learning and Machine Learning
🧭 Keyword Pioneer β€” feature-space transformation
🐣 Hot Topic Early Bird β€” semi-supervised learning
🐝 Cross-Pollinator β€” Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio