A Novel Perspective for Multi-Modal Multi-Label Skin Lesion Classification

yuan zhang; Yutong Xie; Hu Wang; Jodie C Avery; M Louise Hull; Gustavo Carneiro

2025 WACV WACV 2025

A Novel Perspective for Multi-Modal Multi-Label Skin Lesion Classification

Abstract

The efficacy of deep learning-based Computer-Aided Diagnosis (CAD) methods for skin diseases relies on analyzing multiple data modalities (i.e. clinical+dermoscopic images and patient metadata) and addressing the challenges of multi-label classification. Current approaches tend to rely on limited multi-modal techniques and treat the multi-label problem as a multiple multi-class problem overlooking issues related to imbalanced learning and multi-label correlation. This paper introduces the innovative Skin Lesion Classifier utilizing a Multi-modal Multi-label TransFormer-based model (SkinM2Former). For multi-modal analysis we introduce the Tri-Modal Cross-attention Transformer (TMCT) that fuses the three image and metadata modalities at various feature levels of a transformer encoder. For multi-label classification we introduce a multi-head attention (MHA) module to learn multi-label correlations complemented by an optimisation that handles multi-label and imbalanced learning problems. SkinM2Former achieves a mean average accuracy of 77.27% and a mean diagnostic accuracy of 77.85% on the public Derm7pt dataset outperforming state-of-the-art (SOTA) methods.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

yuan zhang , Yutong Xie , Hu Wang , Jodie C Avery , M Louise Hull , Gustavo Carneiro

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Domain Adaptation Deep Learning > Architectures > Transformers

Keywords

transformer architecture multi-label classification multi-modal learning medical image analysis skin lesion classification

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025