Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing

Abhinav Garg; Gowtham P. Vadisetti; Dhananjaya Gowda; Sichen Jin; Aditya Jayasimha; Youngho Han; Jiyeon Kim; Junmo Park; Kwangyoun Kim; Sooyeon Kim; Young-Yoon Lee; Kyungbo Min; Chanwoo Kim

2020 INTERSPEECH INTERSPEECH 2020

Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing

Abstract

In this paper, we present our streaming on-device end-to-end speech recognition solution for a privacy sensitive voice-typing application which primarily involves typing user private details and passwords. We highlight challenges specific to voice-typing scenario in the Korean language and propose solutions to these problems within the framework of a streaming attention-based speech recognition system. Some important challenges in voice-typing are the choice of output units, coupling of multiple characters into longer byte-pair encoded units, lack of sufficient training data. Apart from customizing a high accuracy open domain streaming speech recognition model for voice-typing applications, we retain the performance of the model for open domain tasks without significant degradation. We also explore domain biasing using a shallow fusion with a weighted finite state transducer (WFST). We obtain approximately 13% relative word error rate (WER) improvement on our internal Korean voice-typing dataset without a WFST and about 30% additional WER improvement with a WFST fusion.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — voice typing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Abhinav Garg , Gowtham P. Vadisetti , Dhananjaya Gowda , Sichen Jin , Aditya Jayasimha , Youngho Han , Jiyeon Kim , Junmo Park , Kwangyoun Kim , Sooyeon Kim , Young-Yoon Lee , Kyungbo Min , Chanwoo Kim

Topics

Deep Learning > Techniques > Pretraining Speech & Audio > Recognition > Automatic Speech Recognition Security & Privacy > Privacy

Keywords

attention mechanism automatic speech recognition end-to-end speech recognition weighted finite state transducer streaming speech recognition voice typing

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020