2024 INTERSPEECH INTERSPEECH 2024

Sign Value Constraint Decomposition for Efficient 1-Bit Quantization of Speech Translation Tasks

Abstract

Speech-to-text translation is vital in converting speech input to text output in different languages. While combining speech and machine translation pre-trained models enhances translation quality, it also escalates the number of parameters, resulting in substantial hardware costs for model training and deployment. We propose a 1-bit quantized model based on Sign Value Constraint Decomposition (SVCD) for linear layers to address this challenge. SVCD approximates the weight matrix of the linear layer as a sign matrix and two trainable vectors, preserving higher information capacity at a minor space cost. Additionally, we utilize knowledge distillation to transfer the capability of the original fine-tuned model to the quantized model. The experimental results demonstrate the critical importance of the decoder's attention module in the performance of the quantized speech translation model. Our code is available at https://github.com/myaxxxxx/onebit-st.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio