Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task

Nikolay Bogoychev; Roman Grundkiewicz; Alham Fikri Aji; Maximiliana Behnke; Kenneth Heafield; Sidharth Kashyap; Emmanouil-Ioannis Farsarakis; Mateusz Chudyk

2020 ACL ACL 2020

Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task

Abstract

AbstractWe participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multi-core CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multi-core setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — tensor core

🐣 Hot Topic Early Bird — efficient inference

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio