From Sights to Insights: Towards Summarization of Multimodal Clinical Documents

Akash Ghosh; Mohit Singh Tomar; Abhisek Tiwari; Sriparna Saha; Jatin Avinash Salve; Setu Sinha

2024 ACL ACL 2024

From Sights to Insights: Towards Summarization of Multimodal Clinical Documents

Abstract

AbstractThe advancement of Artificial Intelligence is pivotal in reshaping healthcare, enhancing diagnostic precision, and facilitating personalized treatment strategies. One major challenge for healthcare professionals is quickly navigating through long clinical documents to provide timely and effective solutions. Doctors often struggle to draw quick conclusions from these extensive documents. To address this issue and save time for healthcare professionals, an effective summarization model is essential. Most current models assume the data is only text-based. However, patients often include images of their medical conditions in clinical documents. To effectively summarize these multimodal documents, we introduce EDI-Summ, an innovative Image-Guided Encoder-Decoder Model. This model uses modality-aware contextual attention on the encoder and an image cross-attention mechanism on the decoder, enhancing the BART base model to create detailed visual-guided summaries. We have tested our model extensively on three multimodal clinical benchmarks involving multimodal question and dialogue summarization tasks. Our analysis demonstrates that EDI-Summ outperforms state-of-the-art large language and vision-aware models in these summarization tasks. Disclaimer: The work includes vivid medical illustrations, depicting the essential aspects of the subject matter.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Healthcare & Medicine and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — image-guided summarization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Akash Ghosh , Mohit Singh Tomar , Abhisek Tiwari , Sriparna Saha , Jatin Avinash Salve , Setu Sinha

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Transformers Computer Vision > Domain-Specific > Medical Imaging Natural Language Processing > Applications > Summarization Healthcare & Medicine > Clinical > Medical AI Deep Learning > Learning Types > Multi-Modal Learning

Keywords

medical imaging multimodal learning text summarization clinical document image-guided summarization

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024