Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

Ethan A. Chi; Julian Salazar; Katrin Kirchhoff

2021 NAACL NAACL 2021

Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

Abstract

AbstractNon-autoregressive encoder-decoder models greatly improve decoding speed over autoregressive models, at the expense of generation quality. To mitigate this, iterative decoding models repeatedly infill or refine the proposal of a non-autoregressive model. However, editing at the level of output sequences limits model flexibility. We instead propose *iterative realignment*, which by refining latent alignments allows more flexible edits in fewer steps. Our model, Align-Refine, is an end-to-end Transformer which iteratively realigns connectionist temporal classification (CTC) alignments. On the WSJ dataset, Align-Refine matches an autoregressive baseline with a 14x decoding speedup; on LibriSpeech, we reach an LM-free test-other WER of 9.0% (19% relative improvement on comparable work) in three iterations. We release our code at https://github.com/amazon-research/align-refine.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🐣 Hot Topic Early Bird — iterative refinement

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ethan A. Chi , Julian Salazar , Katrin Kirchhoff

Topics

Deep Learning > Architectures > Transformers Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

speech recognition iterative refinement connectionist temporal classification non-autoregressive model

Download PDF

Related papers

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs 2021

Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks 2021

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction 2021

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing 2021

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers 2021