2024 COLING COLING 2024

Multimodal Cross-lingual Phrase Retrieval

Abstract

AbstractCross-lingual phrase retrieval aims to retrieve parallel phrases among languages. Current approaches only deals with textual modality. There lacks multimodal data resources and explorations for multimodal cross-lingual phrase retrieval (MXPR). In this paper, we create the first MXPR data resource and propose a novel approach for MXPR to explore the effectiveness of multi-modality. The MXPR data resource is built by marrying the benchmark dataset for textual cross-lingual phrase retrieval with Wikimedia Commons, which is a media store containing tremendous texts and related images. In the built resource, the phrase pairs of the textual benchmark dataset are equipped with their related images. Based on this novel data resource, we introduce a strategy to bridge the gap between different modalities by multimodal relation generation with a large multimodal pre-trained model and consistency training. Experiments on benchmarked dataset covering eight language pairs show that our MXPR approach, which deals with multimodal phrases, performs significantly better than pure textual cross-lingual phrase retrieval.

🐣 Hot Topic Early Bird — cross-lingual retrieval
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio