GPL at SemEval-2023 Task 1: WordNet and CLIP to Disambiguate Images

Shibingfeng Zhang; Shantanu Nath; Davide Mazzaccara

2023 SEMEVAL SemEval 2023

GPL at SemEval-2023 Task 1: WordNet and CLIP to Disambiguate Images

Abstract

AbstractGiven a word in context, the task of VisualWord Sense Disambiguation consists of select-ing the correct image among a set of candidates. To select the correct image, we propose a so-lution blending text augmentation and multi-modal models. Text augmentation leverages thefine-grained semantic annotation from Word-Net to get a better representation of the tex-tual component. We then compare this sense-augmented text to the set of image using pre-trained multimodal models CLIP and ViLT. Oursystem has been ranked 16th for the Englishlanguage, achieving 68.5 points for hit rate and79.2 for mean reciprocal rank.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — clip vision language

🐣 Hot Topic Early Bird — multimodal model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shibingfeng Zhang , Shantanu Nath , Davide Mazzaccara

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Classification Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

multimodal learning image matching multimodal model semantic annotation image-text matching text augmentation visual word sense disambiguation clip vision language wordnet synset

Download PDF

Related papers

Coco at SemEval-2023 Task 10: Explainable Detection of Online Sexism 2023

ZBL2W at SemEval-2023 Task 9: A Multilingual Fine-tuning Model with Data Augmentation for Tweet Intimacy Analysis 2023

MLModeler5 at SemEval-2023 Task 3: Detecting the Category and the Framing Techniques in Online News in a Multi-lingual Setup 2023

OPI at SemEval-2023 Task 9: A Simple But Effective Approach to Multilingual Tweet Intimacy Analysis 2023

NLP-LISAC at SemEval-2023 Task 12: Sentiment Analysis for Tweets expressed in African languages via Transformer-based Models 2023