Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments

Nils Holzenberger; Mingxing Du; Julien Karadayi; Rachid Riad; Emmanuel Dupoux

2018 INTERSPEECH INTERSPEECH 2018

Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments

Abstract

Fixed-length embeddings of words are very useful for a variety of tasks in speech and language processing. Here we systematically explore two methods of computing fixed-length embeddings for variable-length sequences. We evaluate their susceptibility to phonetic and speaker-specific variability on English, a high resource language and Xitsonga, a low resource language, using two evaluation metrics: ABX word discrimination and ROC-AUC on same-different phoneme n-grams. We show that a simple downsampling method supplemented with length information can outperform the variable-length input feature representation on both evaluations. Recurrent autoencoders, trained without supervision, can yield even better results at the expense of increased computational complexity.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — variable-length input

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nils Holzenberger , Mingxing Du , Julien Karadayi , Rachid Riad , Emmanuel Dupoux

Topics

Machine Learning > Core Methods > Embedding Learning Machine Learning > Learning Types > Unsupervised Learning Deep Learning > Architectures > Autoencoders

Keywords

unsupervised learning word embedding recurrent autoencoder speech segment variable-length input

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018