2026 WACV WACV 2026

LASOR: Towards Clinically Transparent and Explainable Ophthalmic Report Generation via Lesion-Aware Segmentation

Abstract

Automated ophthalmic report generation aims to reduce the diagnostic burden on retinal specialists by producing clinically accurate and standardized descriptions from medical imaging. However, current research predominantly remains fundus-centric and rarely exploits OCT-derived spatial evidence, limiting clinical transparency by obscuring which anatomical regions drive diagnostic decisions. To address these limitations, we propose LASOR (Lesion-Aware Segmentation-Guided Ophthalmic Report Generation), which extracts multi-scale features to robustly capture both small focal abnormalities and broader anatomical structures, generating reliable segmentation masks as spatial priors for report generation. Specifically, we utilize a lesion-aware patch weighting module to emphasize abnormal regions and leverage a curated instruction dataset incorporating spatial mask information to enhance the diagnostic capabilities of the proposed model. In addition, we introduce a mask-guided cross-modal consistency loss that strengthens vision-language alignment between pathological regions and their diagnostic descriptions. Extensive experiments on a retinal OCT dataset that includes twenty pathological conditions exhibit state-of-the-art performance, underscoring LASOR's potential to advance clinically transparent ophthalmic report generation systems.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio