Bootstrapping a high quality multilingual multimodal
 dataset for Bletchley

Owais Khan Mohammed; Kriti Aggarwal; Qiang Liu; Saksham Singhal; Johan Bjorck; Subhojit Som

2022 ACML ACML 2022

Bootstrapping a high quality multilingual multimodal dataset for Bletchley

Abstract

Vision-language models have recently made impressive strides, primarily driven by large-scale training on web data. While pioneering works such as CLIP and ALIGN show significant improvements, these are focused on English data as it is easy to source them from the web. Towards serving non-English-speaking demographics, we consider various methods for generating multilingual data and find that a simple bootstrapping mechanism works surprisingly well. Specifically, just using English image captions data and text-only multilingual translation pairs we train a fairly strong multilingual vision-language model and then leverage it to create a much cleaner version of the multilingual image captions dataset we collected. We demonstrate that this dataset which was used to train Bletchley result in a strong multi-modal and multilingual model which reaches strong performance across several multilingual zero-shot tasks. Specifically, Bletchley achieves state-of-the-art results on multilingual COCO, Multi30k sets, IGLUE WIT and xFlickr&CO datasets.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Owais Khan Mohammed , Kriti Aggarwal , Qiang Liu , Saksham Singhal , Johan Bjorck , Subhojit Som

Topics

Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

zero-shot learning image captioning vision-language model multilingual learning dataset bootstrapping

Download PDF

Related papers

When to Classify Events in Open Times Series? 2022

Noisy Riemannian Gradient Descent for Eigenvalue Computation with Application to Inexact Stochastic Recursive Gradient Algorithm 2022

A Self-improving Skin Lesions Diagnosis Framework Via Pseudo-labeling and Self-distillation 2022

Towards Data-Free Domain Generalization 2022

SNAIL: Semi-Separated Uncertainty Adversarial Learning for Universal Domain Adaptation 2022