Multi-modal Concept Alignment Pre-training for Generative Medical Visual Question Answering

Quan Yan; Junwen Duan; Jianxin Wang

2024 ACL ACL 2024

Multi-modal Concept Alignment Pre-training for Generative Medical Visual Question Answering

Abstract

AbstractMedical Visual Question Answering (Med-VQA) seeks to accurately respond to queries regarding medical images, a task particularly challenging for open-ended questions. This study unveils the Multi-modal Concept Alignment Pre-training (MMCAP) approach for generative Med-VQA, leveraging a knowledge graph sourced from medical image-caption datasets and the Unified Medical Language System. MMCAP advances the fusion of visual and textual medical knowledge via a graph attention network and a transformer decoder. Additionally, it incorporates a Type Conditional Prompt in the fine-tuning phase, markedly boosting the accuracy and relevance of answers to open-ended questions. Our tests on benchmark datasets illustrate MMCAP’s superiority over existing methods, demonstrating its high efficiency in data-limited settings and effective knowledge-image alignment capability.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Healthcare & Medicine and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — multi-modal concept alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Quan Yan , Junwen Duan , Jianxin Wang

Topics

Machine Learning > Core Methods > Representation Learning Computer Vision > Domain-Specific > Medical Imaging Natural Language Processing > Applications > Machine Reading Comprehension Healthcare & Medicine > Clinical > Medical Imaging Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Models > Vision-Language Models

Keywords

multi-modal learning knowledge graph graph attention network transformer decoder concept alignment medical visual question answering multi-modal concept alignment

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024