Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning

Fuying Wang; Yuyin Zhou; Shujun WANG; Varut Vardhanabhuti; Lequan Yu

2022 NIPS NeurIPS 2022

Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning

Abstract

Learning medical visual representations directly from paired radiology reports has become an emerging topic in representation learning. However, existing medical image-text joint learning methods are limited by instance or local supervision analysis, ignoring disease-level semantic correspondences. In this paper, we present a novel Multi-Granularity Cross-modal Alignment (MGCA) framework for generalized medical visual representation learning by harnessing the naturally exhibited semantic correspondences between medical image and radiology reports at three different levels, i.e., pathological region-level, instance-level, and disease-level. Specifically, we first incorporate the instance-wise alignment module by maximizing the agreement between image-report pairs. Further, for token-wise alignment, we introduce a bidirectional cross-attention strategy to explicitly learn the matching between fine-grained visual tokens and text tokens, followed by contrastive learning to align them. More important, to leverage the high-level inter-subject relationship semantic (e.g., disease) correspondences, we design a novel cross-modal disease-level alignment paradigm to enforce the cross-modal cluster assignment consistency. Extensive experimental results on seven downstream medical image datasets covering image classification, object detection, and semantic segmentation tasks demonstrate the stable and superior performance of our framework.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Healthcare & Medicine and Machine Learning

🧭 Keyword Pioneer — medical visual representation

🐣 Hot Topic Early Bird — radiology report

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Fuying Wang , Yuyin Zhou , Shujun WANG , Varut Vardhanabhuti , Lequan Yu

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Contrastive Learning Computer Vision > Domain-Specific > Medical Imaging Healthcare & Medicine > Clinical > Medical Imaging Deep Learning > Learning Types > Contrastive Learning Deep Learning > Learning Types > Multi-Modal Learning

Keywords

representation learning contrastive learning cross-modal alignment semantic correspondence radiology report medical image medical visual representation

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022