HW-TSC’s Submission for the WMT22 Efficiency Task

Hengchao Shang; Ting Hu; Daimeng Wei; Zongyao Li; Xianzhi Yu; Jianfei Feng; Ting Zhu; Lizhi Lei; Shimin Tao; Hao Yang; Ying Qin; Jinlong Yang; Zhiqiang Rao; Zhengzhe Yu

2022 EMNLP EMNLP 2022

HW-TSC’s Submission for the WMT22 Efficiency Task

Abstract

AbstractThis paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2022 Efficiency Shared Task. For this year’s task, we still apply sentence-level distillation strategy to train small models with different configurations. Then, we integrate the average attention mechanism into the lightweight RNN model to pursue more efficient decoding. We tried adding a retrain step to our 8-bit and 4-bit models to achieve a balance between model size and quality. We still use Huawei Noah’s Bolt for INT8 inference and 4-bit storage. Coupled with Bolt’s support for batch inference and multi-core parallel computing, we finally submit models with different configurations to the CPU latency and throughput tracks to explore the Pareto frontiers.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — batch inference

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hengchao Shang , Ting Hu , Daimeng Wei , Zongyao Li , Xianzhi Yu , Jianfei Feng , Ting Zhu , Lizhi Lei , Shimin Tao , Hao Yang , Ying Qin , Jinlong Yang , Zhiqiang Rao , Zhengzhe Yu

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Model Architecture Natural Language Processing > Applications > Machine Translation Machine Learning > Application Areas > Model Compression Deep Learning > Learning Types > Knowledge Distillation Deep Learning > Optimization & Theory > Efficient Computing

Keywords

model compression knowledge distillation neural machine translation efficient computing neural network batch inference

Download PDF

Generative Entity Typing with Curriculum Learning 2022

Towards Reinterpreting Neural Topic Models via Composite Activations 2022

Weakly Supervised Headline Dependency Parsing 2022

Cross-modal Transfer Between Vision and Language for Protest Detection 2022

HW-TSC’s Submission for the WMT22 Efficiency Task

Abstract

Authors

Topics

Keywords

Related papers