Visual Prompting via Image Inpainting

Amir Bar; Yossi Gandelsman; Trevor Darrell; Amir Globerson; Alexei Efros

2022 NIPS NeurIPS 2022

Visual Prompting via Image Inpainting

Abstract

How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting -- literally just filling in a hole in a concatenated visual prompt image -- turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated -- 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground segmentation, single object detection, colorization, edge detection, etc. Project page: https://yossigandelsman.github.io/visual_prompt

🌱 Topic Pioneer — Prompt Engineering

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — visual prompting

🐣 Hot Topic Early Bird — masked autoencoder

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Amir Bar , Yossi Gandelsman , Trevor Darrell , Amir Globerson , Alexei Efros

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Zero-Shot Learning Deep Learning > Architectures > Autoencoders Computer Vision > Generation > Image Generation Computer Vision > Processing > Image Editing Deep Learning > Learning Types > Prompt Engineering

Keywords

zero-shot learning test-time adaptation image-to-image translation image inpainting foundation model masked autoencoder visual prompting masked auto-encoder image-to-image task

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022