AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities

Guillaume Astruc; Nicolas Gonthier; Clément Mallet; Loic Landrieu

2025 CVPR CVPR 2025

AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities

Abstract

Geospatial models must adapt to the diversity of Earth observation data in terms of resolutions, scales, and modalities. However, existing approaches expect fixed input configurations, which limits their practical applicability. We propose AnySat, a multimodal model based on joint embedding predictive architecture (JEPA) and scale-adaptive spatial encoders, allowing us to train a single model on highly heterogeneous data in a self-supervised manner. To demonstrate the advantages of this unified approach, we compile GeoPlex, a collection of 5 multimodal datasets with varying characteristics and 11 distinct sensors. We then train a single powerful model on these diverse datasets simultaneously. Once fine-tuned or probed, we achieve state-of-the-art results on the test sets of GeoPlex and for 6 external datasets across various environment monitoring tasks: land cover mapping, tree species identification, crop type classification, change detection, climate type classification, and segmentation of flood, burn scar, and deforestation. Our code and models are available at https://github.com/gastruc/AnySat.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — scale-adaptive spatial encoder

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Guillaume Astruc , Nicolas Gonthier , Clément Mallet , Loic Landrieu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Domain-Specific > Remote Sensing Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Models > Vision-Language Models

Keywords

domain adaptation self-supervised learning multimodal learning remote sensing earth observation joint embedding predictive architecture scale-adaptive spatial encoder

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025