CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

Nur Muhammad (Mahi)Shafiullah; Chris Paxton; Lerrel Pinto; Soumith Chintala; arthur szlam

2023 RSS RSS 2023

CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

Abstract

We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization. CLIP-Fields learns a mapping from spatial locations to semantic embedding vectors. Importantly, we show that this mapping can be trained with supervision coming only from web-image and web-text trained models such as CLIP, Detic, and Sentence-BERT; and thus uses no direct human supervision. When compared to baselines like Mask-RCNN, our method outperforms on few-shot instance identification or semantic segmentation on the HM3D dataset with only a fraction of the examples. Finally, we show that using CLIP-Fields as a scene memory, robots can perform semantic navigation in real-world environments. Our code and demonstration videos are available here: https://clip-fields.github.io

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — instance identification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nur Muhammad (Mahi)Shafiullah , Chris Paxton , Lerrel Pinto , Soumith Chintala , arthur szlam

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Weakly Supervised Learning Computer Vision > Analysis > Semantic Segmentation

Keywords

semantic segmentation weakly supervised learning semantic embedding semantic navigation instance identification

Download PDF

Related papers

FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation 2023

Uncertain Pose Estimation during Contact Tasks using Differentiable Contact Features 2023

Follow my Advice: Assume-Guarantee Approach to Task Planning with Human in the Loop 2023

Centralized Model Predictive Control for Collaborative Loco-Manipulation 2023

Robotic Table Tennis: A Case Study into a High Speed Learning System 2023