Text2Loc: 3D Point Cloud Localization from Natural Language

Yan Xia; Letian Shi; Zifeng Ding; João F. Henriques; Daniel Cremers

2024 CVPR CVPR 2024

Text2Loc: 3D Point Cloud Localization from Natural Language

Abstract

We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network Text2Loc that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition followed by fine localization. In global place recognition relational dynamics among each textual hint are captured in a hierarchical transformer with max-pooling (HTM) whereas a balance between positive and negative pairs is maintained using text-submap contrastive learning. Moreover we propose a novel matching-free fine localization method to further refine the location predictions which completely removes the need for complicated text-instance matching and is lighter faster and more accurate than previous methods. Extensive experiments show that Text2Loc improves the localization accuracy by up to 2x over the state-of-the-art on the KITTI360Pose dataset. Our project page is publicly available at: https: //yan-xia.github.io/projects/text2loc/.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — 3d point cloud localization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yan Xia , Letian Shi , Zifeng Ding , João F. Henriques , Daniel Cremers

Topics

Machine Learning > Core Methods > Metric Learning Computer Vision > Analysis > 3D Vision Natural Language Processing > Applications > Information Retrieval Artificial Intelligence > Core AI > Computer Vision Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

contrastive learning point cloud natural language 3d localization hierarchical transformer natural language description point cloud localization place recognition 3d point cloud localization text-submap matching coarse-to-fine localization

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024