PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions

Mahesh Bhosale; Abdul Wasi; Yuanhao Zhai; Yunjie Tian; Samuel Border; Nan Xi; Pinaki Sarder; Junsong Yuan; DAVID DOERMANN; Xuan Gong

2025 ICCV ICCV 2025

PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions

Abstract

Diffusion-based generative models have shown promise in synthesizing histopathology images to address data scarcity caused by privacy constraints. Diagnostic text reports provide high-level semantic descriptions, and masks offer fine-grained spatial structures essential for representing distinct morphological regions. However, public datasets lack paired text and mask data for the same histopathological images, limiting their joint use in image generation. This constraint restricts the ability to fully exploit the benefits of combining both modalities for enhanced control over semantics and spatial details. To overcome this, we propose PathDiff, a diffusion framework that effectively learns from unpaired mask-text data by integrating both modalities into a unified conditioning space. PathDiff allows precise control over structural and contextual features, generating high-quality, semantically accurate images. PathDiff also improves image fidelity, text-image alignment, and faithfulness, enhancing data augmentation for downstream tasks like nuclei segmentation and classification. Extensive experiments demonstrate its superiority over existing methods. Our code is publicly available at https://github.com/bhosalems/PathDiff.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — histopathology image synthesis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mahesh Bhosale , Abdul Wasi , Yuanhao Zhai , Yunjie Tian , Samuel Border , Nan Xi , Pinaki Sarder , Junsong Yuan , DAVID DOERMANN , Xuan Gong

Topics

Machine Learning > Application Areas > Data Augmentation Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Computer Vision > Domain-Specific > Medical Imaging

Keywords

medical imaging data augmentation image synthesis text-to-image generation diffusion model histopathology image unpaired learning histopathology image synthesis

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025