Multi-modal Medical Diagnosis via Large-small Model Collaboration

Wanyi Chen; Zihua Zhao; Jiangchao Yao; Ya Zhang; Jiajun Bu; Haishuai Wang

2025 CVPR CVPR 2025

Multi-modal Medical Diagnosis via Large-small Model Collaboration

Abstract

Recent advances in medical AI have shown a clear trend towards large models in healthcare. However, developing large models for multi-modal medical diagnosis remains challenging due to a lack of sufficient modal-complete medical data. Most existing multi-modal diagnostic models are relatively small and struggle with limited feature extraction capabilities. To bridge this gap, we propose **AdaCoMed**, an **ada**ptive **co**llaborative-learning framework that synergistically integrates the off-the-shelf **med**ical single-modal large models with multi-modal small models. Our framework first employs a mixture-of-modality-experts (MoME) architecture to combine features extracted from multiple single-modal medical large models, and then introduces a novel adaptive co-learning mechanism to collaborate with a multi-modal small model. This co-learning mechanism, guided by an adaptive weighting strategy, dynamically balances the complementary strengths between the MoME-fused large model features and the cross-modal reasoning capabilities of the small model. Extensive experiments on two representative multi-modal medical datasets (MIMIC-IV-MM and MMIST ccRCC) across six modalities and four diagnostic tasks demonstrate consistent improvements over state-of-the-art baselines, making it a promising solution for real-world medical diagnosis applications.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Healthcare & Medicine

🧭 Keyword Pioneer — multi-modal medical diagnosis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wanyi Chen , Zihua Zhao , Jiangchao Yao , Ya Zhang , Jiajun Bu , Haishuai Wang

Topics

Computer Vision > Domain-Specific > Medical Imaging Deep Learning > Models > Large Language Models Healthcare & Medicine > Clinical > Medical AI Deep Learning > Learning Types > Multi-Modal Learning

Keywords

feature extraction knowledge distillation adaptive learning multi-modal learning collaborative learning mixture of expert medical diagnosis large language model multi-modal medical diagnosis

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025