2020 WACV WACV 2020

VRT-Net: Real-Time Scene Parsing via Variable Resolution Transform

Abstract

Urban scene parsing is a basic requirement for various autonomous navigation systems especially in self-driving. Most of the available approaches employ generic image parsing architectures designed for segmentation of object focused scene captured in indoor setups. However, images captured in car-mounted cameras exhibit an extreme effect of perspective geometry, causing a significant scale disparity between near and farther objects. Recognizing this, we formalize a unique Variable Resolution Transform (VRT) technique motivated from the foveal magnification in human eye. Following this, we design a Fovea Estimation Network (FEN) which is trained to estimate a single most convenient fixation location along with the associated magnification factor, best suited for a given input image. The proposed framework is designed to enable its usage as a wrapper over the available real-time scene parsing models, thereby demonstrating a superior trade-off between speed and quality as compared to the prior state-of-the-arts.

🚀 Conference Pioneer — WACV 2020
🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization
🧭 Keyword Pioneer — variable resolution
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio