X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA

Min Hyuk Kim; Changheon Kim; Seok Bong Yoo

2025 EMNLP EMNLP 2025

X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA

Abstract

AbstractMedical visual question answering (VQA) and federated learning (FL) have emerged as vital approaches for enabling privacy-preserving, collaborative learning across clinical institutions. However, both these approaches face significant challenges in cross-modal FL scenarios, where each client possesses unpaired images from only one modality. To address this limitation, we propose X-FLoRA, a cross-modal FL framework that uses modality-expert low-rank adaptation (LoRA) for medical VQA. Specifically, X-FLoRA enables the synthesis of images from one modality to another without requiring data sharing between clients. This is achieved by training a backward translation model within a federated asymmetric translation scheme that integrates clinical semantics from textual data. Additionally, X-FLoRA introduces modality-expert LoRA, which fine-tunes separate LoRA modules to strengthen modality-specific representations in the VQA task. The server aggregates the trained backward translation models and fine-tuned LoRA modules using discriminator quality scores and expert-aware weighting, which regulate the relative contributions from different clients. Experiments were conducted on VQA datasets encompassing different medical modalities, and the results demonstrate that X-FLoRA outperforms existing FL methods in terms of VQA performance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Healthcare & Medicine and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Min Hyuk Kim , Changheon Kim , Seok Bong Yoo

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Federated Learning Natural Language Processing > Applications > Machine Reading Comprehension Healthcare & Medicine > Clinical > Medical Imaging Machine Learning > Learning Types > Federated Learning Machine Learning > Learning Paradigms > Federated Learning Computer Vision > Core AI > Multimodal Learning Deep Learning > Techniques > Transfer Learning

Keywords

federated learning visual question answering medical imaging cross-modal learning low-rank adaptation medical visual question answering modality translation

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025