Modeling ASR Ambiguity for Neural Dialogue State Tracking

Vaishali Pal; Fabien Guillot; Manish Shrivastava; Jean-Michel Renders; Laurent Besacier

2020 INTERSPEECH INTERSPEECH 2020

Modeling ASR Ambiguity for Neural Dialogue State Tracking

Abstract

Spoken dialogue systems typically use one or several (top-N) ASR sequence(s) for inferring the semantic meaning and tracking the state of the dialogue. However, ASR graphs, such as confusion networks (confnets), provide a compact representation of a richer hypothesis space than a top-N ASR list. In this paper, we study the benefits of using confusion networks with a neural dialogue state tracker (DST). We encode the 2-dimensional confnet into a 1-dimensional sequence of embeddings using a confusion network encoder which can be used with any DST system. Our confnet encoder is plugged into the ‘Global-locally Self-Attentive Dialogue State Tacker’ (GLAD) model for DST and obtains significant improvements in both accuracy and inference time compared to using top-N ASR hypotheses.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — asr ambiguity

Authors

Vaishali Pal , Fabien Guillot , Manish Shrivastava , Jean-Michel Renders , Laurent Besacier

Topics

Deep Learning > Architectures > Transformers Natural Language Processing > Generation > Dialogue Systems Natural Language Processing > Applications > Natural Language Inference Natural Language Processing > Applications > Dialogue Systems

Keywords

automatic speech recognition dialogue state tracking semantic meaning spoken dialogue confusion network neural network asr ambiguity

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020