KiUT: Knowledge-Injected U-Transformer for Radiology Report Generation

Zhongzhen Huang; Xiaofan Zhang; Shaoting Zhang

2023 CVPR CVPR 2023

KiUT: Knowledge-Injected U-Transformer for Radiology Report Generation

Abstract

Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image, which could relieve radiologists from the heavy burden of report writing. Although various image caption methods have shown remarkable performance in the natural image field, generating accurate reports for medical images requires knowledge of multiple modalities, including vision, language, and medical terminology. We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information with contextual and clinical knowledge for word prediction. In detail, a U-connection schema between the encoder and decoder is designed to model interactions between different modalities. And a symptom graph and an injected knowledge distiller are developed to assist the report generation. Experimentally, we outperform state-of-the-art methods on two widely used benchmark datasets: IU-Xray and MIMIC-CXR. Further experimental results prove the advantages of our architecture and the complementary benefits of the injected knowledge.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Healthcare & Medicine and Natural Language Processing

🧭 Keyword Pioneer — multi-level visual representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhongzhen Huang , Xiaofan Zhang , Shaoting Zhang

Topics

Deep Learning > Architectures > Transformers Computer Vision > Generation > Image Captioning Computer Vision > Domain-Specific > Medical Imaging Natural Language Processing > Generation > Text Generation Healthcare & Medicine > Clinical > Medical Imaging

Keywords

medical imaging knowledge distillation multi-modal learning vision-language model radiology report generation knowledge injection symptom graph medical terminology x-ray image multi-level visual representation

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023