Semantic Segmentation by Early Region Proxy

Yifan Zhang; Bo Pang; Cewu Lu

2022 CVPR CVPR 2022

Semantic Segmentation by Early Region Proxy

Abstract

Typical vision backbones manipulate structured features. As a compromise, semantic segmentation has long been modeled as per-point prediction on dense regular grids. In this work, we present a novel and efficient modeling that starts from interpreting the image as a tessellation of learnable regions, each of which has flexible geometrics and carries homogeneous semantics. To model region-wise context, we exploit Transformer to encode regions in a sequence-to-sequence manner by applying multi-layer self-attention on the region embeddings, which serve as proxies of specific regions. Semantic segmentation is now carried out as per-region prediction on top of the encoded region embeddings using a single linear classifier, where a decoder is no longer needed. The proposed RegProxy model discards the common Cartesian feature layout and operates purely at region level. Hence, it exhibits the most competitive performance-efficiency trade-off compared with the conventional dense prediction methods. For example, on ADE20K, the small-sized RegProxy-S/16 outperforms the best CNN model using 25% parameters and 4% computation, while the largest RegProxy-L/16 achieves 52.9mIoU which outperforms the state-of-the-art by 2.1% with fewer resources. Codes and models are available at https://github.com/YiF-Zhang/RegionProxy.

📈 Trend Setter — Semantic Segmentation

🐣 Hot Topic Early Bird — semantic segmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — per-region prediction

Authors

Yifan Zhang , Bo Pang , Cewu Lu

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Transformers Computer Vision > Analysis > Semantic Segmentation Computer Vision > Processing > Semantic Segmentation Deep Learning > Learning Types > Representation Learning

Keywords

semantic segmentation image segmentation vision transformer computer vision region proposal dense prediction region embedding per-region prediction learnable region

Download PDF

Related papers

UniCoRN: A Unified Conditional Image Repainting Network 2022

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 2022

All-in-One Image Restoration for Unknown Corruption 2022

Stability-Driven Contact Reconstruction From Monocular Color Images 2022

Forecasting Characteristic 3D Poses of Human Actions 2022