A Hybrid Seq-2-Seq ASR Design for On-Device and Server Applications

Cyril Allauzen; Ehsan Variani; Michael Riley; David Rybach; Hao Zhang

2021 INTERSPEECH INTERSPEECH 2021

A Hybrid Seq-2-Seq ASR Design for On-Device and Server Applications

Abstract

This paper proposes and evaluates alternative speech recognition design strategies using the hybrid autoregressive transducer (HAT) model. The different strategies are designed with special attention to the choice of modeling units and to the integration of different types of external language models during first-pass beam-search or second-pass re-scoring. These approaches are compared on a large-scale voice search task and the recognition quality over the head and tail of speech data is analyzed. Our experiments show decent improvements in WER over common speech phrases and significant gains on uncommon ones compared to the state-of-the-art approaches.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Cyril Allauzen , Ehsan Variani , Michael Riley , David Rybach , Hao Zhang

Topics

Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

automatic speech recognition language model beam search word error rate hybrid autoregressive transducer

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021