2023
CVPR
CVPR 2023
EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata
Abstract
We learn a visual representation that captures information about the camera that recorded a given photo. To do this, we train a multimodal embedding between image patches and the EXIF metadata that cameras automatically insert into image files. Our model represents this metadata by simply converting it to text and then processing it with a transformer. The features that we learn significantly outperform other self-supervised and supervised features on downstream image forensics and calibration tasks. In particular, we successfully localize spliced image regions "zero shot" by clustering the visual embeddings for all of the patches within an image.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning
🧭
Keyword Pioneer
— camera metadatum
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Multimodal Learning
Machine Learning > Core Methods > Embedding Learning
Machine Learning > Learning Types > Self-Supervised Learning
Deep Learning > Learning Types > Self-Supervised Learning
Artificial Intelligence > Core AI > Computer Vision
Artificial Intelligence > Core AI > Multi-Modal Learning