Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition

Jiachen Lian; Alan W Black; Louis Goldstein; Gopala Krishna Anumanchipalli

2022 INTERSPEECH INTERSPEECH 2022

Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition

Abstract

Most of the research on data-driven speech representation learning has focused on raw audios in an end-to-end manner, paying little attention to their internal phonological or gestural structure. This work, investigating the speech representations derived from articulatory kinematics signals, uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores. By applying sparse constraints, the gestural scores leverage the discrete combinatorial properties of phonological gestures. Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully. The proposed work thus makes a bridge between articulatory phonology and deep neural networks to leverage informative, intelligible, interpretable,and efficient speech representations. The code is made publicly available at \url{https://github.com/Berkeley-Speech-Group/ema_gesture}.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — convolutive matrix factorization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiachen Lian , Alan W Black , Louis Goldstein , Gopala Krishna Anumanchipalli

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Core Methods > Representation Learning Speech & Audio > Analysis > Speech Analysis

Keywords

phoneme recognition speech representation convolutive matrix factorization articulatory kinematics gesture decomposition sparse constraint neural network articulatory phonology gestural decomposition

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022