Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis

Cheung-Chi Leung; Lei Wang; Haihua Xu; Jingyong Hou; Van Tung Pham; Hang Lv; Lei Xie; Xiong Xiao; Chongjia Ni; Bin Ma; Eng Siong Chng; Haizhou Li

2016 INTERSPEECH INTERSPEECH 2016

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis

Abstract

This paper documents the significant components of a state-of-the-art language-independent query-by-example spoken term detection system designed for the Query by Example Search on Speech Task (QUESST) in MediaEval 2015. We developed exact and partial matching DTW systems, and WFST based symbolic search systems to handle different types of search queries. To handle the noisy and reverberant speech in the task, we trained tokenizers using data augmented with different noise and reverberation conditions. Our post-evaluation analysis showed that the phone boundary label provided by the improved tokenizers brings more accurate speech activity detection in DTW systems. We argue that acoustic condition mismatch is possibly a more important factor than language mismatch for obtaining consistent gain from stacked bottleneck features. Our post-evaluation system, involving a smaller number of component systems, can outperform our submitted systems, which performed the best for the task.

🚀 Conference Pioneer — INTERSPEECH 2016

🌉 Interdisciplinary Bridge — Computer Science and Machine Learning

📈 Trend Setter — Sequence Modeling

🧭 Keyword Pioneer — speech tokenizer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Cheung-Chi Leung , Lei Wang , Haihua Xu , Jingyong Hou , Van Tung Pham , Hang Lv , Lei Xie , Xiong Xiao , Chongjia Ni , Bin Ma , Eng Siong Chng , Haizhou Li

Topics

Computer Science > Applications > Information Retrieval Machine Learning > Core Methods > Sequence Modeling

Keywords

dynamic time warping bottleneck feature spoken term detection weighted finite-state transducer speech tokenizer

Download PDF

A Feature Study for Masking-Based Reverberant Speech Separation 2016

Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling 2016

Today’s Most Frequently Used F0Estimation Methods, and Their Accuracy in Estimating Male and Female Pitch in Clean Speech 2016

A Sparse Spherical Harmonic-Based Model in Subbands for Head-Related Transfer Functions 2016

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis

Abstract

Authors

Topics

Keywords

Related papers