Device-directed Utterance Detection

Sri Harish Mallidi; Roland Maas; Kyle Goehner; Ariya Rastrow; Spyros Matsoukas; Bjorn Hoffmeister

2018 INTERSPEECH INTERSPEECH 2018

Device-directed Utterance Detection

Abstract

In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants. Applications include rejection of false wake-ups or unintended interactions as well as enabling wake-word free follow-up queries. Consider the example interaction: "Computer, play music", "Computer, reduce the volume". In this interaction, the user needs to repeat the wake-word (Computer) for the second query. To allow for more natural interactions, the device could immediately re-enter listening state after the first query (without wake-word repetition) and accept or reject a potential follow-up as device-directed or background speech. The proposed model consists of two long short-term memory (LSTM) neural networks trained on acoustic features and automatic speech recognition (ASR) 1-best hypotheses, respectively. A feed-forward deep neural network (DNN) is then trained to combine the acoustic and 1-best embeddings, derived from the LSTMs, with features from the ASR decoder. Experimental results show that ASR decoder, acoustic embeddings and 1-best embeddings yield an equal-error-rate (EER) of 9.3%, 10.9% and 20.1%, respectively. Combination of the features resulted in a 44% relative improvement and a final EER of 5.2%.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — utterance detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Sri Harish Mallidi , Roland Maas , Kyle Goehner , Ariya Rastrow , Spyros Matsoukas , Bjorn Hoffmeister

Topics

Machine Learning > Core Methods > Classification Speech & Audio > Recognition > Speech Recognition

Keywords

automatic speech recognition long short term memory utterance detection device-directed query false wake-up rejection

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018