2023 INTERSPEECH INTERSPEECH 2023

Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus

Abstract

State-of-the-art end-to-end automatic speech recognition (ASR) systems are becoming increasingly complex and expensive for practical applications. This paper develops a high-performance and low-footprint 4-bit quantized Conformer ASR system. A key feature of the system design is to account for the fine-grained, varying performance sensitivity at different Conformer components to quantization errors. Neural architectural compression and mixed precision quantization approaches were used to auto-configure the optimal substructures and quantization bit-widths within each Conformer submodule. Experiments conducted on the 300-hr Switchboard data suggest that the obtained auto-configured systems consistently outperform the uniform precision quantized baseline Conformer of comparable bit-widths in terms of word error rate (WER). An overall "lossless" compression ratio of 16.2 times was obtained over the 32-bit full-precision baseline while incurring no statistically significant WER increase.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
📈 Trend Setter — Model Merging
🧭 Keyword Pioneer — neural architecture compression
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Natural Language Processing, Speech & Audio