Predicting Temporal Performance Drop of Deployed Production Spoken Language Understanding Models

Quynh Do; Judith Gaspers; Daniil Sorokin; Patrick Lehnen

2021 INTERSPEECH INTERSPEECH 2021

Predicting Temporal Performance Drop of Deployed Production Spoken Language Understanding Models

Abstract

In deployed real-world spoken language understanding (SLU) applications, data continuously flows into the system. This leads to distributional differences between training and application data that can deteriorate model performance. While regularly retraining the deployed model with new data helps mitigating this problem, it implies significant computational and human costs. In this paper, we develop a method, which can help guiding decisions on whether a model is safe to keep in production without notable performance loss or needs to be retrained. Towards this goal, we build a performance drop regression model for an SLU model that was trained offline to detect a potential model drift in the production phase. We present a wide range of experiments on multiple real-world datasets, indicating that our method is useful for guiding decisions in the SLU model development cycle and to reduce costs for model retraining.

🧭 Keyword Pioneer — model drift

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Quynh Do , Judith Gaspers , Daniil Sorokin , Patrick Lehnen

Topics

Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Optimization

Keywords

spoken language understanding model drift model deployment regression model performance drop prediction

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021