LiT: Zero-Shot Transfer With Locked-Image Text Tuning

Xiaohua Zhai; Xiao Wang; Basil Mustafa; Andreas Steiner; Daniel Keysers; Alexander Kolesnikov; Lucas Beyer

2022 CVPR CVPR 2022

LiT: Zero-Shot Transfer With Locked-Image Text Tuning

Abstract

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training. In our empirical study we find that locked pre-trained image models with unlocked text models work best. We call this instance of contrastive-tuning "Locked-image Tuning" (LiT), which just teaches a text model to read out good representations from a pre-trained image model for new tasks. A LiT model gains the capability of zero-shot transfer to new vision tasks, such as image classification or retrieval. The proposed LiT is widely applicable; it works reliably with multiple pre-training methods (supervised and unsupervised) and across diverse architectures (ResNet, Vision Transformers and MLP-Mixer) using three different image-text datasets. With the transformer-based pre-trained ViT-g/14 model, the LiT model achieves 84.5% zero-shot transfer accuracy on the ImageNet test set, and 81.1% on the challenging out-of-distribution ObjectNet test set.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — image text

🐣 Hot Topic Early Bird — vision language

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiaohua Zhai , Xiao Wang , Basil Mustafa , Andreas Steiner , Daniel Keysers , Alexander Kolesnikov , Lucas Beyer

Topics

Machine Learning > Learning Types > Zero-Shot Learning Deep Learning > Architectures > Transformers Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Transfer Learning

Keywords

image classification contrastive learning zero-shot transfer vision language image-text model image text

Download PDF

Related papers

UniCoRN: A Unified Conditional Image Repainting Network 2022

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 2022

All-in-One Image Restoration for Unknown Corruption 2022

Stability-Driven Contact Reconstruction From Monocular Color Images 2022

Forecasting Characteristic 3D Poses of Human Actions 2022