Relaxing Binary Constraints in Contrastive Vision-Language Medical Representation Learning

Xiaoyang Wei; Camille Kurtz; Florence Cloppet

2025 WACV WACV 2025

Relaxing Binary Constraints in Contrastive Vision-Language Medical Representation Learning

Abstract

By aligning paired image and caption embeddings as input contrastive vision-language representation learning has witnessed significant advances as illustrated by CLIP allowing visual encoders to learn from textual supervision and vice versa. Benefiting from millions of image-caption pairs collected from the Internet CLIP-like models show competitive performances against fully supervised baselines. However the learned visual representations are still undermined due to the binary constraint as most contrastive learning frameworks follow strict one-to-one correspondence for the input pairs of data and optimize the models using the InfoNCE loss function. The embeddings of the paired image-text are aligned while the unpaired image-text are pushed away from each other. In fact there are naturally many "false negatives" among these negative pairs since unpaired data can also have a high similarity. In this work we aim to overcome the impact of false negatives in vision-language representation learning by introducing soft targets for estimating the similarity between unpaired images and texts using external semantic knowledge structured in the form of graphs. The interest of such a method is demonstrated in the application context of medical imaging.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiaoyang Wei , Camille Kurtz , Florence Cloppet

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Contrastive Learning

Keywords

representation learning contrastive learning medical imaging vision-language model soft target

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025