A Simple Recipe for Language-guided Domain Generalized Segmentation

Mohammad Fahes; Tuan-Hung VU; Andrei Bursuc; Patrick Pérez; Raoul de Charette

2024 CVPR CVPR 2024

A Simple Recipe for Language-guided Domain Generalized Segmentation

Abstract

Generalization to new domains not seen during training is one of the long-standing challenges in deploying neural networks in real-world applications. Existing generalization techniques either necessitate external images for augmentation and/or aim at learning invariant representations by imposing various alignment constraints. Large-scale pretraining has recently shown promising generalization capabilities along with the potential of binding different modalities. For instance the advent of vision-language models like CLIP has opened the doorway for vision models to exploit the textual modality. In this paper we introduce a simple framework for generalizing semantic segmentation networks by employing language as the source of randomization. Our recipe comprises three key ingredients: (i) the preservation of the intrinsic CLIP robustness through minimal fine-tuning (ii) language-driven local style augmentation and (iii) randomization by locally mixing the source and augmented styles during training. Extensive experiments report state-of-the-art results on various generalization benchmarks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — language-guided augmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mohammad Fahes , Tuan-Hung VU , Andrei Bursuc , Patrick Pérez , Raoul de Charette

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Domain Generalization

Keywords

semantic segmentation domain generalization vision-language model language-guided augmentation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024