2017 INTERSPEECH INTERSPEECH 2017

Investigating Scalability in Hierarchical Language Identification System

Abstract

State-of-the-art language identification (LID) systems are not easily scalable to accommodate new languages. Specifically, as the number of target languages grows the error rate of these LID systems increases rapidly. This paper addresses such a challenge by adopting a hierarchical language identification (HLID) framework. We demonstrate the superior scalability of the HLID framework. In particular, HLID only requires the training of relevant nodes in a hierarchical structure instead of re-training the entire tree. Experiments conducted on a dataset that combined languages from the NIST LRE 2007, 2009, 2011 and 2015 databases show that as the number of target languages grows from 28 to 42, the performance of a single level (non-hierarchical) system deteriorates by around 11% while that of the hierarchical system only deteriorates by about 3.4% in terms of Cavg. Finally, experiments also suggest that SVM based systems are more scalable than GPLDA based systems.

🧭 Keyword Pioneer — hierarchical language identification
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio