Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

Chong Ma; Hanqi Jiang; Wenting Chen; Yiwei Li; Zihao Wu; Xiaowei Yu; Zhengliang Liu; Lei Guo; Dajiang Zhu; Tuo Zhang; Dinggang Shen; Tianming Liu; Xiang Li

2024 NIPS NeurIPS 2024

Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

Abstract

In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — medical representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Natural Language Processing

Authors

Chong Ma , Hanqi Jiang , Wenting Chen , Yiwei Li , Zihao Wu , Xiaowei Yu , Zhengliang Liu , Lei Guo , Dajiang Zhu , Tuo Zhang , Dinggang Shen , Tianming Liu , Xiang Li

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Representation Learning Computer Vision > Domain-Specific > Medical Imaging Healthcare & Medicine > Research > Bioinformatics Healthcare & Medicine > Research > Medical AI Computer Vision > Core AI > Multimodal Learning Machine Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Multi-Modal Learning

Keywords

representation learning medical imaging multi-modal learning eye tracking cross-modal alignment multi-modal alignment medical representation eye-gaze datum image-text retrieval cross-modality feature

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024