One-Shot Open Affordance Learning with Foundation Models

Gen Li; Deqing Sun; Laura Sevilla-Lara; Varun Jampani

2024 CVPR CVPR 2024

One-Shot Open Affordance Learning with Foundation Models

Abstract

We introduce One-shot Open Affordance Learning (OOAL) where a model is trained with just one example per base object category but is expected to identify novel objects and affordances. While vision-language models excel at recognizing novel objects and scenes they often struggle to understand finer levels of granularity such as affordances. To handle this issue we conduct a comprehensive analysis of existing foundation models to explore their inherent understanding of affordances and assess the potential for data-limited affordance learning. We then propose a vision-language framework with simple and effective designs that boost the alignment between visual features and affordance text embeddings. Experiments on two affordance segmentation benchmarks show that the proposed method outperforms state-of-the-art models with less than 1% of the full training data and exhibits reasonable generalization capability on unseen objects and affordances. Project page: https://reagan1311.github.io/ooal.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Gen Li , Deqing Sun , Laura Sevilla-Lara , Varun Jampani

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Few-Shot Learning Computer Vision > Processing > Image Segmentation Deep Learning > Models > Vision-Language Models

Keywords

one-shot learning image segmentation few-shot learning foundation model vision-language model affordance learning affordance segmentation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024