Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

Kailai Zhou; Yibo Wang; Tao Lv; Yunqian Li; Linsen Chen; Qiu Shen; Xun Cao

2022 CVPR CVPR 2022

Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

Abstract

We endeavor on a rarely explored task named Insubstan-tial Object Detection (IOD), which aims to localize the object with following characteristics: (1) amorphous shape with indistinct boundary; (2) similarity to surroundings; (3) absence in color. Accordingly, it is far more challenging to distinguish insubstantial objects in a single static frame and the collaborative representation of spatial and tempo-ral information is crucial. Thus, we construct an IOD-Video dataset comprised of 600 videos (141,017 frames) covering various distances, sizes, visibility, and scenes captured by different spectral ranges. In addition, we develop a spatio-temporal aggregation framework for IOD, in which differ-ent backbones are deployed and a spatio-temporal aggregation loss (STAloss) is elaborately designed to leverage the consistency along the time axis. Experiments conducted on IOD-Video dataset demonstrate that spatio-temporal aggregation can significantly improve the performance of IOD. We hope our work will attract further researches into this valuable yet challenging task. The code will be available at: https://github.com/CalayZhou/IOD-Video.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🐣 Hot Topic Early Bird — video processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kailai Zhou , Yibo Wang , Tao Lv , Yunqian Li , Linsen Chen , Qiu Shen , Xun Cao

Topics

Computer Vision > Analysis > Object Detection Computer Vision > Processing > Video Processing Computer Vision > Processing > Video Understanding Computer Vision > Analysis > Video Understanding Artificial Intelligence > Core AI > Computer Vision

Keywords

anomaly detection object detection motion estimation video understanding video dataset temporal consistency video processing spatio-temporal aggregation

Download PDF

Related papers

UniCoRN: A Unified Conditional Image Repainting Network 2022

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 2022

All-in-One Image Restoration for Unknown Corruption 2022

Stability-Driven Contact Reconstruction From Monocular Color Images 2022

Forecasting Characteristic 3D Poses of Human Actions 2022