A Low Latency ASR-Free End to End Spoken Language Understanding System

Mohamed Mhiri; Samuel Myer; Vikrant Singh Tomar

2020 INTERSPEECH INTERSPEECH 2020

A Low Latency ASR-Free End to End Spoken Language Understanding System

Abstract

In recent years, developing a speech understanding system that classifies a waveform to structured data, such as intents and slots, without first transcribing the speech to text has emerged as an interesting research problem. This work proposes such as system with an additional constraint of designing a system that has a small enough footprint to run on small micro-controllers and embedded systems with minimal latency. Given a streaming input speech signal, the proposed system can process it segment-by-segment without the need to have the entire stream at the moment of processing. The proposed system is evaluated on the publicly available Fluent Speech Commands dataset. Experiments show that the proposed system yields state-of-the-art performance with the advantage of low latency and a much smaller model when compared to other published works on the same task.

🧭 Keyword Pioneer — speech mask detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐣 Hot Topic Early Bird — audio classification

Authors

Mohamed Mhiri , Samuel Myer , Vikrant Singh Tomar

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Application Areas > Data Augmentation Speech & Audio > Recognition > Speech Recognition Machine Learning > Learning Types > Transfer Learning Machine Learning > Learning Types > Deep Learning

Keywords

representation learning transfer learning data augmentation intent classification spoken language understanding audio classification pre-trained model low latency slot filling embedded system speech mask detection

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020