DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Alexander H. Liu; Heng-Jui Chang; Michael Auli; Wei-Ning Hsu; Jim Glass

2023 NIPS NeurIPS 2023

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Abstract

In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering. We show that these concepts complement each other and result in a strong representation learning model for speech. DinoSR first extracts contextualized embeddings from the input audio with a teacher network, then runs an online clustering system on the embeddings to yield a machine-discovered phone inventory, and finally uses the discretized tokens to guide a student network. We show that DinoSR surpasses previous state-of-the-art performance in several downstream tasks, and provide a detailed analysis of the model and the learned discrete units.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexander H. Liu , Heng-Jui Chang , Michael Auli , Wei-Ning Hsu , Jim Glass

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Learning Types > Contrastive Learning Machine Learning > Learning Types > Self-Supervised Learning

Keywords

online clustering self-supervised learning masked language modeling contextualized embedding speech representation

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023