LASO: Language-guided Affordance Segmentation on 3D Object

Yicong Li; Na Zhao; Junbin Xiao; Chun Feng; Xiang Wang; Tat-Seng Chua

2024 CVPR CVPR 2024

LASO: Language-guided Affordance Segmentation on 3D Object

Abstract

Segmenting affordance in 3D data is key for bridging perception and action in robots. Existing efforts mostly focus on the visual side and overlook the affordance knowledge from a semantic aspect. This oversight not only limits their generalization to unseen objects but more importantly hinders their synergy with large language models (LLMs) which are excellent task planners that can decompose an overarching command into agent-actionable instructions. With this regard we propose a novel task Language-guided Affordance Segmentation on 3D Object (LASO) which challenges a model to segment a 3D object's part relevant to a given affordance question. To facilitate the task we contribute a dataset comprising 19751 point-question pairs covering 8434 object shapes and 870 expert-crafted questions. As a pioneer solution we further propose PointRefer which highlights an adaptive fusion module to identify target affordance regions at different scales. To ensure a text-aware segmentation we adopt a set of affordance queries conditioned on linguistic cues to generate dynamic kernels. These kernels are further used to convolute with point features and generate a segmentation mask. Comprehensive experiments and analyses validate PointRefer's effectiveness. With these efforts We hope that LASO can steer the direction of 3D affordance guiding it towards enhanced integration with the evolving capabilities of LLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yicong Li , Na Zhao , Junbin Xiao , Chun Feng , Xiang Wang , Tat-Seng Chua

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Domain Generalization

Keywords

semantic segmentation domain generalization point cloud 3d object language guidance affordance segmentation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024