2022
INTERSPEECH
INTERSPEECH 2022
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning
Abstract
We introduce a simple neural encoder architecture that can be trained using an unsupervised contrastive learning objective which gets its positive samples from data-augmented k-Nearest Neighbors search. We show that when built on top of recent self-supervised audio representations, this method can be applied iteratively and yield competitive SSE as evaluated on two tasks: query-by-example of random sequences of speech, and spoken term discovery. On both tasks our method pushes the state-of-the-art by a significant margin across 5 different languages. Finally, we establish a benchmark on a query-by-example task on the LibriSpeech dataset to monitor future improvements in the field.
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning and Speech & Audio
🧭
Keyword Pioneer
— self-supervised audio representation
Authors
Topics
Machine Learning > Core Methods > Metric Learning
Machine Learning > Core Methods > Embedding Learning
Machine Learning > Learning Types > Contrastive Learning
Speech & Audio > Analysis > Speech Analysis
Deep Learning > Learning Types > Self-Supervised Learning
Deep Learning > Learning Types > Contrastive Learning