S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-Shot Segmentation

Yuan Cheng; Yuchao Yang; Hai-Bao Chen; Ngai Wong; Hao Yu

2021 WACV WACV 2021

S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-Shot Segmentation

Abstract

Real-time understanding in video is crucial in various AI applications such as autonomous driving. This work presents a fast single-shot segmentation strategy for video scene understanding. The proposed net, called S3-Net, quickly locates and segments target sub-scenes, meanwhile extracts structured time-series semantic features as inputs to an LSTM-based spatio-temporal model. Utilizing tensorization and quantization techniques, S3-Net is intended to be lightweight for edge computing. Experiments using CityScapes, UCF11, HMDB51 and MOMENTS datasets demonstrate that the proposed S3-Net achieves an accuracy improvement of 8.1% versus the 3D-CNN based approach on UCF11, a storage reduction of 6.9x and an inference speed of 22.8 FPS on CityScapes with a GTX1080Ti GPU.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — video scene understanding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuan Cheng , Yuchao Yang , Hai-Bao Chen , Ngai Wong , Hao Yu

Topics

Machine Learning > Application Areas > Efficient Computing Computer Vision > Processing > Video Understanding Computer Vision > Domain-Specific > Autonomous Driving

Keywords

semantic segmentation real-time processing edge computing video scene understanding single-shot segmentation

Download PDF

Related papers

Multimodal Humor Dataset: Predicting Laughter Tracks for Sitcoms 2021

Benchmark for Evaluating Pedestrian Action Prediction 2021

Regional Attention Networks With Context-Aware Fusion for Group Emotion Recognition 2021

Robust Lensless Image Reconstruction via PSF Estimation 2021

Improved Training of Generative Adversarial Networks Using Decision Forests 2021