Interpretable Sparse Features for Probing Self-Supervised Speech Models

Iñigo Parra

2025 AACL AACL 2025

Interpretable Sparse Features for Probing Self-Supervised Speech Models

Abstract

AbstractSelf-supervised speech models have demonstrated the ability to learn rich acoustic representations. However, interpreting which specific phonological or acoustic features these models leverage within their highly polysemantic activations remains challenging. In this paper, we propose a straightforward and unsupervised probing method for model interpretability. We extract the activations from the final MLP layer of a pretrained HuBERT model and train a sparse autoencoder (SAE) using dictionary learning techniques to generate an over-complete set of latent representations. Analyzing these latent codes, we observe that a small subset of high-variance units consistently aligns with phonetic events, suggesting their potential utility as interpretable acoustic detectors. Our proposed method does not require labeled data beyond raw audio, providing a lightweight and accessible tool to gain insights into the internal workings of self-supervised speech models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Iñigo Parra

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Architectures > Autoencoders

Keywords

dictionary learning model interpretability sparse autoencoder speech representation self-supervised speech model

Download PDF

Related papers

Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge 2025

Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems 2025

Enhancing Training Data Quality through Influence Scores for Generalizable Classification: A Case Study on Sexism Detection 2025

CtrlShift: Steering Language Models for Dense Quotation Retrieval with Dynamic Prompts 2025

A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics 2025