MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages

Zixian Huang; Wenhao Zhu; Gong Cheng; Lei Li; Fei Yuan

2024 NIPS NeurIPS 2024

MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages

Abstract

Reasoning capabilities are crucial for Large Language Models~(LLMs), yet a notable gap exists between English and non-English languages. To bridge this disparity, some works fine-tune LLMs to relearn reasoning capabilities in non-English languages, while others replace non-English inputs with an external model's outputs such as English translation text to circumvent the challenge of LLM understanding non-English. Unfortunately, these methods often underutilize the built-in skilled reasoning and useful language understanding capabilities of LLMs. In order to better utilize the minds of reasoning and language understanding in LLMs, we propose a new method, namely MergeMinds, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance. Furthermore, a two-step training scheme is introduced to first train to embeded the external capabilities into LLMs and then train the collaborative utilization of the external capabilities and the built-in capabilities in LLMs. Experiments on three multilingual reasoning datasets and a language understanding dataset demonstrate that MergeMinds consistently outperforms all baselines, especially in low-resource languages. Without updating the parameters of LLMs, the average accuracy improved by 6.7 and 8.0 across all languages and low-resource languages on the MGSM dataset, respectively.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — multilingual reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Zixian Huang , Wenhao Zhu , Gong Cheng , Lei Li , Fei Yuan

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Model Merging Machine Learning > Learning Types > Multi-Task Learning Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Natural Language Inference Deep Learning > Models > Large Language Models Artificial Intelligence > Core AI > Language Deep Learning > Learning Types > Multi-Lingual Learning

Keywords

cross-lingual transfer model merging language understanding low-resource language parameter efficiency multilingual reasoning large language model

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024