2025 WACV WACV 2025

Pix2Poly: A Sequence Prediction Method for End-to-End Polygonal Building Footprint Extraction from Remote Sensing Imagery

Abstract

Extraction of building footprint polygons from remotely sensed data is essential for several urban understanding tasks such as reconstruction navigation & mapping. Despite significant progress in the area extracting accurate polygonal vector building footprints remains an open problem. In this paper we introduce Pix2Poly an attention-based end-to-end trainable & differentiable deep neural network capable of directly generating explicit high-quality building footprints in a ring graph format. Pix2Poly employs a generative encoder-decoder transformer to produce a sequence of graph vertex tokens whose connectivity information is learned by an optimal matching network. Compared to previous graph learning methods ours is a truly end-to-end trainable approach that extracts high-quality building footprints & road networks without requiring complicated computationally intensive raster loss functions & intricate training pipelines. Upon evaluating Pix2Poly on several complex & challenging datasets we report that Pix2Poly outperforms state-of-the-art methods in several vector shape quality metrics while being an entirely explicit method. Our code is available at https://github.com/yeshwanth95/Pix2Poly.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — polygonal reconstruction
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio