CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment

Sajid Javed; Arif Mahmood; Iyyakutti Iyappan Ganapathi; Fayaz Ali Dharejo; Naoufel Werghi; Mohammed Bennamoun

2024 CVPR CVPR 2024

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment

Abstract

This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP) a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific dictionary generating textual descriptions for images using language models and retrieving relevant images for each text snippet via a pre-trained model. The model is then fine-tuned using a many-to-many contrastive learning method to align complex interrelated concepts across both modalities. Evaluated across multiple histopathology tasks CPLIP shows notable improvements in zero-shot learning scenarios outperforming existing methods in both interpretability and robustness and setting a higher benchmark for the application of vision-language models in the field. To encourage further research and replication the code for CPLIP is available on GitHubat https://cplip.github.io/

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Healthcare & Medicine and Machine Learning

🧭 Keyword Pioneer — pathology-specific dictionary

🐣 Hot Topic Early Bird — vision-language alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sajid Javed , Arif Mahmood , Iyyakutti Iyappan Ganapathi , Fayaz Ali Dharejo , Naoufel Werghi , Mohammed Bennamoun

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Learning Types > Contrastive Learning Healthcare & Medicine > Clinical > Medical Imaging Artificial Intelligence > Learning Paradigms > Zero-Shot Learning Deep Learning > Learning Types > Contrastive Learning Deep Learning > Learning Types > Zero-Shot Learning

Keywords

unsupervised learning contrastive learning zero-shot learning vision-language alignment vision-language model image-text alignment pathology-specific dictionary

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024