2024 CVPR CVPR 2024

LASO: Language-guided Affordance Segmentation on 3D Object

Abstract

Segmenting affordance in 3D data is key for bridging perception and action in robots. Existing efforts mostly focus on the visual side and overlook the affordance knowledge from a semantic aspect. This oversight not only limits their generalization to unseen objects but more importantly hinders their synergy with large language models (LLMs) which are excellent task planners that can decompose an overarching command into agent-actionable instructions. With this regard we propose a novel task Language-guided Affordance Segmentation on 3D Object (LASO) which challenges a model to segment a 3D object's part relevant to a given affordance question. To facilitate the task we contribute a dataset comprising 19751 point-question pairs covering 8434 object shapes and 870 expert-crafted questions. As a pioneer solution we further propose PointRefer which highlights an adaptive fusion module to identify target affordance regions at different scales. To ensure a text-aware segmentation we adopt a set of affordance queries conditioned on linguistic cues to generate dynamic kernels. These kernels are further used to convolute with point features and generate a segmentation mask. Comprehensive experiments and analyses validate PointRefer's effectiveness. With these efforts We hope that LASO can steer the direction of 3D affordance guiding it towards enhanced integration with the evolving capabilities of LLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio