M3UCD: A Multi-task Multimodal Metaphor Understanding Challenge Dataset for LLMs

Tianlong Zheng; Yating Yang; Rui Dong; Bo Ma; Lei Wang; Xi Zhou; Siru Miao; Turghun Osman

2026 AAAI AAAI 2026

M3UCD: A Multi-task Multimodal Metaphor Understanding Challenge Dataset for LLMs

Abstract

Abstract Understanding multimodal metaphors represents a crucial pathway for machines to comprehend human cognition. However, current research remains constrained by superficial dataset annotations, insufficient systematic evaluation of large language models, and fragmented task frameworks. To bridge these gaps, the paper proposes a systematic solution featuring: (I) We present the largest fine-grained Multi-task Multimodal Metaphor Understanding Challenge Dataset (M3UCD) built via multi-perspective collaborative annotation. It contains 15,345 samples, each annotated with 12 manual attribute labels. (II) Systematic benchmarking of LLMs' capacity boundaries in metaphor understanding. Evaluation results reveal the persistent challenges LLMs face in this domain while validating M3UCD's effectiveness and potential. (III) A concise and unified multi-task baseline framework was developed and demonstrated its effectiveness in enhancing the metaphor understanding capabilities of MLLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tianlong Zheng , Yating Yang , Rui Dong , Bo Ma , Lei Wang , Xi Zhou , Siru Miao , Turghun Osman

Topics

Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Applications > Text Classification

Keywords

multi-task learning multimodal learning fine-grained classification metaphor understanding large language model

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026