The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results

Yan Jia; Xingming Wang; Xiaoyi Qin; Yinping Zhang; Xuyang Wang; Junjie Wang; Dong Zhang; Ming Li

2021 INTERSPEECH INTERSPEECH 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results

Abstract

The 2020 Personalized Voice Trigger Challenge (PVTC2020) addresses two different research problems in a unified setup: joint wake-up word detection with speaker verification on close-talking single microphone data and far-field multi-channel microphone array data. Specially, the second task poses an additional cross-channel matching challenge on top of the far-field condition. To simulate the real-life application scenario, the enrollment utterances are recorded from close-talking cell-phone only, while the test utterances are recorded from both the close-talking cell-phone and the far-field microphone arrays. This paper introduces our challenge setup and the released database as well as the evaluation metrics. In addition, we present a sequential two stage end-to-end neural network baseline system trained with the proposed database for speaker-dependent wake-up word detection. Results show that state-of-the-art personalized voice trigger methods are still based on the two stage design, however, this benchmark database could also be used to evaluate multi-task joint learning methods. The official website, the open-source baseline system and results of submitted systems have been released.

🧭 Keyword Pioneer — voice trigger

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Yan Jia , Xingming Wang , Xiaoyi Qin , Yinping Zhang , Xuyang Wang , Junjie Wang , Dong Zhang , Ming Li

Topics

Speech & Audio > Recognition > Speech Recognition Speech & Audio > Recognition > Speaker Recognition

Keywords

speaker verification speaker recognition end-to-end neural network far-field speech voice trigger wake-up word

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

Keyword Transformer: A Self-Attention Model for Keyword Spotting 2021