CTYUN-AI at SemEval-2025 Task 1: Learning to Rank for Idiomatic Expressions
Abstract
AbstractWe propose a multimodal framework integrating textual context and image caption analysis via systematic data augmentation and parameter-efficient fine-tuning. Our approach features: (1) option shuffling to eliminate positional bias, (2) lexical augmentation through synonym replacement and back-translation, and (3) optimized cross-modal ranking adaptation. The system ranks first in Portuguese (Top-1 Acc: 0.92) and second in English (Top-1 Acc: 0.87) on CodaBench. Experiments across 7B-72B models reveal 32B architectures achieve optimal capacity-trainability balance, while larger 72B models suffer from overfitting. Results demonstrate the limitations of GPT-4 knowledge distillation and emphasize controlled data augmentation for idiomatic language learning, advancing multimodal figurative language processing techniques.