2025 COLING COLING 2025

MoKA:Parameter Efficiency Fine-Tuning via Mixture of Kronecker Product Adaption

Abstract

AbstractWith the rapid development of large language models (LLMs), traditional full-parameter fine-tuning methods have become increasingly expensive in terms of computational resources and time costs. For this reason, parameter efficient fine-tuning (PEFT) methods have emerged. Among them, Low-Rank Adaptation (LoRA) is one of the current popular PEFT methods, which is widely used in large language models. However, the low-rank update mechanism of LoRA somewhat limits its ability to approximate full-parameter fine-tuning during the training process. In this paper, we propose a novel PEFT framework, MoKA (Mixture of Kronecker Product Adaptation), which combines the Kronecker product with the Mixture-of-Experts (MoE) method. By replacing the low-rank decomposition of the weight update matrix with Kronecker products and utilizing a sparse MoE architecture, MoKA achieves parameter efficiency and better model performance. Additionally, we design an efficient routing module to further compress the parameter size. We conduct extensive experiments on the GLUE benchmark, E2E NLG Challenge, and instruction tuning tasks for LLMs. The results demonstrate that MoKA outperforms existing PEFT methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio