Controllable Text-to-Image Synthesis for Multi-Modality MR Images

Kyuri Kim; Yoonho Na; Sung-Joon Ye; Jimin Lee; Sung Soo Ahn; Ji Eun Park; Hwiyoung Kim

2024 WACV WACV 2024

Controllable Text-to-Image Synthesis for Multi-Modality MR Images

Abstract

Generative modeling has seen significant advancements in recent years, especially in the realm of text-to-image synthesis. Despite this progress, the medical field has yet to fully leverage the capabilities of large-scale foundational models for synthetic data generation. This paper introduces a framework for text-conditional magnetic resonance (MR) imaging generation, addressing the complexities associated with multi-modality considerations. The framework comprises a pre-trained large language model, a diffusion-based prompt-conditional image generation architecture, and an additional denoising network for input structural binary masks. Experimental results demonstrate that the proposed framework is capable of generating realistic, high-resolution, and high-fidelity multi-modal MR images that align with medical language text prompts. Further, the study interprets the cross-attention maps of the generated results based on text-conditional statements. The contributions of this research lay a robust foundation for future studies in text-conditional medical image generation and hold significant promise for accelerating advancements in medical imaging research.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Healthcare & Medicine and Machine Learning

🧭 Keyword Pioneer — prompt conditioning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio