X-Instruction: Aligning Language Model in Low-resource Languages with Self-curated Cross-lingual Instructions

Chong Li; Wen Yang; Jiajun Zhang; Jinliang Lu; Shaonan Wang; Chengqing Zong

2024 ACL ACL 2024

X-Instruction: Aligning Language Model in Low-resource Languages with Self-curated Cross-lingual Instructions

Abstract

AbstractLarge language models respond well in high-resource languages like English but struggle in low-resource languages. It may arise from the lack of high-quality instruction following data in these languages. Directly translating English samples into these languages can be a solution but unreliable, leading to responses with translation errors and lacking language-specific or cultural knowledge. To address this issue, we propose a novel method to construct cross-lingual instruction following samples with instruction in English and response in low-resource languages. Specifically, the language model first learns to generate appropriate English instructions according to the natural web texts in other languages as responses. The candidate cross-lingual instruction tuning samples are further refined and diversified. We have employed this method to build a large-scale cross-lingual instruction tuning dataset on 10 languages, namely X-Instruction. The instruction data built using our method incorporate more language-specific knowledge compared with the naive translation method. Experimental results have shown that the response quality of the model tuned on X-Instruction greatly exceeds the model distilled from a powerful teacher model, reaching or even surpassing the ones of ChatGPT. In addition, we find that models tuned on cross-lingual instruction following samples can follow the instruction in the output language without further tuning.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — cross-lingual instruction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chong Li , Wen Yang , Jiajun Zhang , Jinliang Lu , Shaonan Wang , Chengqing Zong

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Multilingual NLP Natural Language Processing > Resources & Methods > Transfer Learning Deep Learning > Learning Types > Fine-Tuning

Keywords

transfer learning multilingual nlp cross-lingual transfer instruction tuning low-resource language cross-lingual instruction language-specific knowledge

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024