Guided Slot Attention for Unsupervised Video Object Segmentation

Minhyeok Lee; Suhwan Cho; Dogyoon Lee; Chaewon Park; Jungho Lee; Sangyoun Lee

2024 CVPR CVPR 2024

Guided Slot Attention for Unsupervised Video Object Segmentation

Abstract

Unsupervised video object segmentation aims to segment the most prominent object in a video sequence. However the existence of complex backgrounds and multiple foreground objects make this task challenging. To address this issue we propose a guided slot attention network to reinforce spatial structural information and obtain better foreground-background separation. The foreground and background slots which are initialized with query guidance are iteratively refined based on interactions with template information. Furthermore to improve slot-template interaction and effectively fuse global and local features in the target and reference frames K-nearest neighbors filtering and a feature aggregation transformer are introduced. The proposed model achieves state-of-the-art performance on two popular datasets. Additionally we demonstrate the robustness of the proposed model in challenging scenes through various comparative experiments.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — feature aggregation transformer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Minhyeok Lee , Suhwan Cho , Dogyoon Lee , Chaewon Park , Jungho Lee , Sangyoun Lee

Topics

Machine Learning > Learning Types > Unsupervised Learning Deep Learning > Architectures > Transformers Computer Vision > Processing > Image Segmentation Computer Vision > Processing > Video Processing Computer Vision > Processing > Video Segmentation Computer Vision > Analysis > Object Segmentation

Keywords

unsupervised learning slot attention feature aggregation foreground-background separation video object segmentation unsupervised video object segmentation feature aggregation transformer knn filtering

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024