TubeFormer-DeepLab: Video Mask Transformer

Dahun Kim; jun xie; Huiyu Wang; Siyuan Qiao; Qihang Yu; Hong-Seok Kim; Hartwig Adam; In So Kweon; Liang-Chieh Chen

2022 CVPR CVPR 2022

TubeFormer-DeepLab: Video Mask Transformer

Abstract

We present TubeFormer-DeepLab, the first attempt to tackle multiple core video segmentation tasks in a unified manner. Different video segmentation tasks (e.g., video semantic/instance/panoptic segmentation) are usually considered as distinct problems. State-of-the-art models adopted in the separate communities have diverged, and radically different approaches dominate in each task. By contrast, we make a crucial observation that video segmentation tasks could be generally formulated as the problem of assigning different predicted labels to video tubes (where a tube is obtained by linking segmentation masks along the time axis) and the labels may encode different values depending on the target task. The observation motivates us to develop TubeFormer-DeepLab, a simple and effective video mask transformer model that is widely applicable to multiple video segmentation tasks. TubeFormer-DeepLab directly predicts video tubes with task-specific labels (either pure semantic categories, or both semantic categories and instance identities), which not only significantly simplifies video segmentation models, but also advances state-of-the-art results on multiple video segmentation benchmarks.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Dahun Kim , jun xie , Huiyu Wang , Siyuan Qiao , Qihang Yu , Hong-Seok Kim , Hartwig Adam , In So Kweon , Liang-Chieh Chen

Topics

Deep Learning > Architectures > Transformers Computer Vision > Analysis > Semantic Segmentation Computer Vision > Analysis > Video Understanding Computer Vision > Processing > Video Segmentation

Keywords

semantic segmentation video segmentation video instance segmentation video panoptic segmentation mask transformer video semantic segmentation video tube

Download PDF

Related papers

UniCoRN: A Unified Conditional Image Repainting Network 2022

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 2022

All-in-One Image Restoration for Unknown Corruption 2022

Stability-Driven Contact Reconstruction From Monocular Color Images 2022

Forecasting Characteristic 3D Poses of Human Actions 2022