CTC Variations Through New WFST Topologies

Aleksandr Laptev; Somshubra Majumdar; Boris Ginsburg

2022 INTERSPEECH INTERSPEECH 2022

CTC Variations Through New WFST Topologies

Abstract

This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition. Three new CTC variants are proposed: (1) the "compact- CTC”, in which direct transitions between units are replaced with 〈ε〉 back-off transitions; (2) the "minimal-CTC”, that only adds 〈blank〉 self-loops when used in WFST-composition; and (3) the "selfless-CTC” variants, which disallows self-loop for non-blank units. Compact-CTC allows for 1.5 times smaller WFST decoding graphs and reduces memory consumption by two times when training CTC models with the LF-MMI objective without hurting the recognition accuracy. Minimal-CTC reduces graph size and memory consumption by two and four times for the cost of a small accuracy drop. Using selfless-CTC can improve the accuracy for wide context window models.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — wfst composition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Aleksandr Laptev , Somshubra Majumdar , Boris Ginsburg

Topics

Machine Learning > Optimization & Theory > Optimization Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

connectionist temporal classification weighted finite-state transducer wfst composition decoding graph blank transition

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022