A Mostly Data-Driven Approach to Inverse Text Normalization

Ernest Pusateri; Bharat Ram Ambati; Elizabeth Brooks; Ondrej Platek; Donald McAllaster; Venki Nagesha

2017 INTERSPEECH INTERSPEECH 2017

A Mostly Data-Driven Approach to Inverse Text Normalization

Abstract

For an automatic speech recognition system to produce sensibly formatted, readable output, the spoken-form token sequence produced by the core speech recognizer must be converted to a written-form string. This process is known as inverse text normalization (ITN). Here we present a mostly data-driven ITN system that leverages a set of simple rules and a few hand-crafted grammars to cast ITN as a labeling problem. To this labeling problem, we apply a compact bi-directional LSTM. We show that the approach performs well using practical amounts of training data.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — inverse text normalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Ernest Pusateri , Bharat Ram Ambati , Elizabeth Brooks , Ondrej Platek , Donald McAllaster , Venki Nagesha

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

sequence labeling automatic speech recognition bidirectional lstm inverse text normalization

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017