Modgenix at SemEval-2025 Task 1: Context Aware Vision Language Ranking (CAViLR) for Multimodal Idiomaticity Understanding

Joydeb Mondal; Pramir Sarkar

2025 ACL ACL 2025

Modgenix at SemEval-2025 Task 1: Context Aware Vision Language Ranking (CAViLR) for Multimodal Idiomaticity Understanding

Abstract

AbstractThis paper presents CAViLR, a hybrid multimodal approach for SemEval-2025 Task 1. Our methodintegrates CLIP as a baseline with a Mixture of Experts (MoE) framework that dynamically selectsexpert models such as Pixtral-12B and Phi-3.5 based on input context. The approach addresseschallenges in both image ranking and image sequence prediction, improving the alignment of visualand textual semantics. Experimental results demonstrate that our hybrid model outperforms individualmodels. Future work will focus on refining expert selection and enhancing disambiguation strategiesfor complex idiomatic expressions.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio