A De Novo Divide-and-Merge Paradigm for Acoustic Model Optimization in Automatic Speech Recognition

Conghui Tan; Di Jiang; Jinhua Peng; Xueyang Wu; Qian Xu; Qiang Yang

2020 IJCAI IJCAI 2020

A De Novo Divide-and-Merge Paradigm for Acoustic Model Optimization in Automatic Speech Recognition

Abstract

Due to the rising awareness of privacy protection and the voluminous scale of speech data, it is becoming infeasible for Automatic Speech Recognition (ASR) system developers to train the acoustic model with complete data as before. In this paper, we propose a novel Divide-and-Merge paradigm to solve salient problems plaguing the ASR field. In the Divide phase, multiple acoustic models are trained based upon different subsets of the complete speech data, while in the Merge phase two novel algorithms are utilized to generate a high-quality acoustic model based upon those trained on data subsets. We first propose the Genetic Merge Algorithm (GMA), which is a highly specialized algorithm for optimizing acoustic models but suffers from low efficiency. We further propose the SGD-Based Optimizational Merge Algorithm (SOMA), which effectively alleviates the efficiency bottleneck of GMA and maintains superior performance. Extensive experiments on public data show that the proposed methods can significantly outperform the state-of-the-art.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐣 Hot Topic Early Bird — model merging

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio