Domain-Specific Utterance End-Point Detection for Speech Recognition

Roland Maas; Ariya Rastrow; Kyle Goehner; Gautam Tiwari; Shaun Joseph; Bjorn Hoffmeister

2017 INTERSPEECH INTERSPEECH 2017

Domain-Specific Utterance End-Point Detection for Speech Recognition

Abstract

The task of automatically detecting the end of a device-directed user request is particularly challenging in case of switching short command and long free-form utterances. While low-latency end-pointing configurations typically lead to good user experiences in the case of short requests, such as “play music”, it can be too aggressive in domains with longer free-form queries, where users tend to pause noticeably between words and hence are easily cut off prematurely. We previously proposed an approach for accurate end-pointing by continuously estimating pause duration features over all active recognition hypotheses. In this paper, we study the behavior of these pause duration features and infer domain-dependent parametrizations. We furthermore propose to adapt the end-pointer aggressiveness on-the-fly by comparing the Viterbi scores of active short command vs. long free-form decoding hypotheses. The experimental evaluation evidences a 18% relative reduction in word error rate on free-form requests while maintaining low latency on short queries.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — endpoint detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Roland Maas , Ariya Rastrow , Kyle Goehner , Gautam Tiwari , Shaun Joseph , Bjorn Hoffmeister

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Domain Adaptation Speech & Audio > Recognition > Speech Recognition Speech & Audio > Analysis > Prosody Analysis

Keywords

domain adaptation speech recognition utterance classification endpoint detection pause duration end-point detection

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017