PALI-NLP at SemEval 2025 Task 1: Multimodal Idiom Recognition and Alignment

Runyang You; Xinyue Mei; Mengyuan Zhou

2025 ACL ACL 2025

PALI-NLP at SemEval 2025 Task 1: Multimodal Idiom Recognition and Alignment

Abstract

AbstractUnderstanding idioms in multimodal contexts poses significant challenges due to data scarcity, idiomatic ambiguity, and the need for effective alignment of visual and textual inputs. In this work, we introduce MIRA (Multimodal Idiom Recognition and Alignment), a training-free framework designed to address these challenges on the SemEval-2025 Task 1 (AdMIRe) benchmark. MIRA leverages powerful closed-source large language models (LLMs) and integrates three key innovations: bias correction via in-context learning, multi-step semantic-visual fusion, and a self-revision mechanism that iteratively refines its outputs through backward verification. By systematically processing and fusing multimodal inputs, MIRA generates high-quality, fine-grained image-text representations that enhance idiom comprehension across different languages and cultural contexts. Experimental evaluations in both English and Portuguese demonstrate that our approach achieves robust performance without the need for additional training, setting a new standard for multimodal idiom recognition.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — visual-semantic fusion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio