2025 AAAI AAAI 2025

Advancing Medical Multimodal Learning and Data Generation with Diffusion Model and LLM

Abstract

Abstract Synthesizing electronic health records (EHR) is essential for addressing data scarcity, bias, and fairness in healthcare models. EHR data are inherently multimodal and sequential, encompassing structured codes, clinical notes, medical images, and irregular time intervals. Traditional generative models like GANs and VAEs struggle to capture these complexities, while diffusion-based models offer improvements but remain limited to task-specific applications. To address these challenges, two diffusion-based models, MedDiffusion and EHRPD, have been developed. MedDiffusion enhances health risk prediction by generating synthetic patient data and capturing visit-level relationships, while EHRPD generates sequential, multimodal EHR data, incorporating temporal interval estimation to improve diversity and fidelity. Future work aims to overcome limitations in multimodal data generation by developing a generalized model capable of handling diverse modalities simultaneously, expanding the applicability of EHR data generation across healthcare tasks.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors