2026 AAAI AAAI 2026

Segment Anything Across Shots: A Method and Benchmark

Abstract

Abstract This work focuses on multi-shot semi-supervised video object segmentation (MVOS), which aims at segmenting the target object indicated by an initial mask throughout a video with multiple shots. While existing VOS methods mainly focus on single-shot videos, they often fail to handle shot discontinuities, thereby limiting their real-world applicability. Furthermore, the lack of annotated multi-shot data poses a major challenge for MVOS research. To address these issues, we propose a transition mimicking data augmentation strategy (TMA) that enables cross-shot generalization using single-shot data, and a transition-aware method, Segment Anything Across Shots (SAAS), which detects and comprehends shot transitions during inference. To support evaluation and future study in MVOS, we introduce Cut-VOS, a new MVOS benchmark with dense mask annotations, diverse object categories, and high-frequency transitions. Extensive experiments on YouMVOS and Cut-VOS demonstrate that the proposed SAAS achieves state-of-the-art performance by effectively mimicking, understanding, and segmenting across complex transitions.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — shot transition detection
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio