Extracting Speaker-Specific Information with a Regularized Siamese Deep Network

Ke Chen; Ahmad Salman

2011 NIPS NeurIPS 2011

Extracting Speaker-Specific Information with a Regularized Siamese Deep Network

Abstract

Speech conveys different yet mixed information ranging from linguistic to speaker-specific components, and each of them should be exclusively used in a specific task. However, it is extremely difficult to extract a specific information component given the fact that nearly all existing acoustic representations carry all types of speech information. Thus, the use of the same representation in both speech and speaker recognition hinders a system from producing better performance due to interference of irrelevant information. In this paper, we present a deep neural architecture to extract speaker-specific information from MFCCs. As a result, a multi-objective loss function is proposed for learning speaker-specific characteristics and regularization via normalizing interference of non-speaker related information and avoiding information loss. With LDC benchmark corpora and a Chinese speech corpus, we demonstrate that a resultant speaker-specific representation is insensitive to text/languages spoken and environmental mismatches and hence outperforms MFCCs and other state-of-the-art techniques in speaker recognition. We discuss relevant issues and relate our approach to previous work.

🌱 Topic Pioneer — Contrastive Learning

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

📈 Trend Setter — Contrastive Learning

🧭 Keyword Pioneer — speaker-specific information

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

🐣 Hot Topic Early Bird — feature extraction

Authors

Ke Chen , Ahmad Salman

Topics

Machine Learning > Core Methods > Metric Learning Machine Learning > Learning Types > Contrastive Learning Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speaker Recognition Deep Learning > Learning Types > Multi-Task Learning

Keywords

feature extraction speaker verification speaker recognition speaker-specific information siamese neural network acoustic feature extraction multi-objective learning regularized deep network acoustic representation deep neural network siamese network

Download PDF

Related papers

Co-Training for Domain Adaptation 2011

The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning 2011

Learning to Agglomerate Superpixel Hierarchies 2011

A Reinforcement Learning Theory for Homeostatic Regulation 2011

A Global Structural EM Algorithm for a Model of Cancer Progression 2011