How Does Quantization Affect Multilingual LLMs?

Kelly Marchisio; Saurabh Dash; Hongyu Chen; Dennis Aumiller; Ahmet Üstün; Sara Hooker; Sebastian Ruder

2024 EMNLP EMNLP 2024

How Does Quantization Affect Multilingual LLMs?

Abstract

AbstractQuantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantization on LLMs in English, none have evaluated across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on performance across languages and at varying scales. We use automatic benchmarks, LLM-as-a-Judge, and human evaluation, finding that (1) harmful effects of quantization are apparent in human evaluation, which automatic metrics severely underestimate: a 1.7% average drop in Japanese across automatic tasks corresponds to a 16.0% drop reported by human evaluators on realistic prompts; (2) languages are disparately affected by quantization, with non-Latin script languages impacted worst; and (3) challenging tasks like mathematical reasoning degrade fastest. As the ability to serve low-compute models is critical for wide global adoption of NLP technologies, our results urge consideration of multilingual performance as a key evaluation criterion for efficient models.

❓ The Questioner

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — automatic benchmark

🐣 Hot Topic Early Bird — multilingual large language model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kelly Marchisio , Saurabh Dash , Hongyu Chen , Dennis Aumiller , Ahmet Üstün , Sara Hooker , Sebastian Ruder

Topics

Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Multilingual NLP Deep Learning > Models > Large Language Models Deep Learning > Optimization & Theory > Efficient Computing

Keywords

model compression model quantization mathematical reasoning performance evaluation inference efficiency human evaluation multilingual language model multilingual large language model automatic benchmark

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024