Self-supervised speech representations display some human-like cross-linguistic perceptual abilities

Joselyn Rodriguez; Kamala Sreepada; Ruolan Leslie Famularo; Sharon Goldwater; Naomi Feldman

2024 CONLL CoNLL 2024

Self-supervised speech representations display some human-like cross-linguistic perceptual abilities

Abstract

AbstractState of the art models in automatic speech recognition have shown remarkable improvements due to modern self-supervised (SSL) transformer-based architectures such as wav2vec 2.0 (Baevski et al., 2020). However, how these models encode phonetic information is still not well understood. We explore whether SSL speech models display a linguistic property that characterizes human speech perception: language specificity. We show that while wav2vec 2.0 displays an overall language specificity effect when tested on Hindi vs. English, it does not resemble human speech perception when tested on finer-grained differences in Hindi speech contrasts.

🌉 Interdisciplinary Bridge — Deep Learning and Interdisciplinary

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Joselyn Rodriguez , Kamala Sreepada , Ruolan Leslie Famularo , Sharon Goldwater , Naomi Feldman

Topics

Deep Learning > Architectures > Transformers Interdisciplinary > Linguistics > Phonetics Interdisciplinary > Cognitive Science > Perception

Keywords

self-supervised learning speech representation phonetic information cross-linguistic perception

Download PDF

Related papers

Lossy Context Surprisal Predicts Task-Dependent Patterns in Relative Clause Processing 2024

Global-Pruner: A Stable and Efficient Pruner for Retraining-Free Pruning of Encoder-Based Language Models 2024

Transformer verbatim in-context retrieval across time and scale 2024

EditEval: An Instruction-Based Benchmark for Text Improvements 2024

An Empirical Comparison of Vocabulary Expansion and Initialization Approaches For Language Models 2024