EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata

Chenhao Zheng; Ayush Shrivastava; Andrew Owens

2023 CVPR CVPR 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata

Abstract

We learn a visual representation that captures information about the camera that recorded a given photo. To do this, we train a multimodal embedding between image patches and the EXIF metadata that cameras automatically insert into image files. Our model represents this metadata by simply converting it to text and then processing it with a transformer. The features that we learn significantly outperform other self-supervised and supervised features on downstream image forensics and calibration tasks. In particular, we successfully localize spliced image regions "zero shot" by clustering the visual embeddings for all of the patches within an image.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — camera metadatum

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chenhao Zheng , Ayush Shrivastava , Andrew Owens

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Embedding Learning Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Self-Supervised Learning Artificial Intelligence > Core AI > Computer Vision Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

self-supervised learning multimodal learning visual representation image forensics multimodal embedding camera metadatum exif metadatum

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations 2023