2025
COLING
COLING 2025
ELAINE-medLLM: Lightweight English Japanese Chinese Trilingual Large Language Model for Bio-medical Domain
Abstract
AbstractWe propose ELAINE (EngLish-jApanese-chINesE)-medLLM, a trilingual (English, Japanese, Chinese) large language model adapted for the bio-medical domain based on Llama-3-8B. The training dataset was carefully curated in terms of volume and diversity to adapt to the biomedical domain and endow trilingual capability while preserving the knowledge and abilities of the base model. The training follows 2-stage paths: continued pre-training and supervised fine-tuning (SFT). Our results demonstrate that ELAINE-medLLM exhibits superior trilingual capabilities compared to existing bilingual or multilingual medical LLMs without severely sacrificing the base model’s capability.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Healthcare & Medicine and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— trilingual model
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Application Areas > Domain Adaptation
Natural Language Processing > Resources & Methods > Large Language Models
Natural Language Processing > Resources & Methods > Multilingual NLP
Artificial Intelligence > Core AI > Large Language Models
Healthcare & Medicine > Clinical > Medical AI