Layer-wise Swapping for Generalizable Multilingual Safety

Hyunseo Shin; Wonseok Hwang

2026 EACL EACL 2026

Layer-wise Swapping for Generalizable Multilingual Safety

Abstract

AbstractDespite the rapid advancements of Large Language Models (LLMs), safety risks remain a critical challenge for low-resource languages. Existing safety datasets are predominantly English-centric, limiting progress in multilingual safety alignment. As a result, low-resource expert models—fine-tuned on their respective instruction datasets—tend to exhibit higher unsafety rates compared to their high-resource counterparts. In this work, we propose a safety aware layer swapping method that transfers safety alignment from an English safety expert to low-resource language experts without additional training. To further enhance transfer ability, our method adaptively selects or blends modules based on their degree of specialization. Our approach preserves performance on general language understanding tasks while enhancing safety in the target languages. Experimental results show that the proposed method achieves comparable performance to the language expert on general benchmarks such as MMMLU, BELEBELE, and MGSM, while producing more aligned and less harmful responses on the MultiJail safety benchmark

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hyunseo Shin , Wonseok Hwang

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Resources & Methods > Large Language Models

Keywords

transfer learning domain adaptation safety alignment multilingual safety large language model

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026