Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Tianyi Tang; Wenyang Luo; Haoyang Huang; Dongdong Zhang; Xiaolei Wang; Xin Zhao; Furu Wei; Ji-Rong Wen

2024 ACL ACL 2024

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Abstract

AbstractLarge language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora.It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts.In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions.Specially, we propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs.Based on LAPE, we conduct comprehensive experiments on several representative LLMs, such as LLaMA-2, BLOOM, and Mistral. Our findings indicate that LLMs’ proficiency in processing a particular language is predominantly due to a small subset of neurons, primarily situated in the models’ top and bottom layers.Furthermore, we showcase the feasibility to “steer” the output language of LLMs by selectively activating or deactivating language-specific neurons. Our research provides important evidence to the understanding and exploration of the multilingual capabilities of LLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — language-specific neuron

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🐣 Hot Topic Early Bird — multilingual processing

Authors

Tianyi Tang , Wenyang Luo , Haoyang Huang , Dongdong Zhang , Xiaolei Wang , Xin Zhao , Furu Wei , Ji-Rong Wen

Topics

Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Multilingual NLP Machine Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Resources & Methods > Language Modeling

Keywords

transformer architecture multilingual processing neuron analysis language-specific neuron multilingual capability neuron activation large language model language activation

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024