Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

Joint Self-Supervised Video Alignment and Action Segmentation ICCV 2025

MOVE: Motion-Guided Few-Shot Video Object Segmentation ICCV 2025

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition ICCV 2025

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos ICCV 2025

VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos ICCV 2025

OVG-HQ: Online Video Grounding with Hybrid-modal Queries ICCV 2025

MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding ICCV 2025

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding ICCV 2025

KDA: Knowledge Diffusion Alignment with Enhanced Context for Video Temporal Grounding ICCV 2025

What's Making That Sound Right Now? Video-centric Audio-Visual Localization ICCV 2025

DisTime: Distribution-based Time Representation for Video Large Language Models ICCV 2025

Fine-grained Spatiotemporal Grounding on Egocentric Videos ICCV 2025

Open-World Skill Discovery from Unsegmented Demonstration Videos ICCV 2025

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding ICCV 2025

CRAM: Large Scale Video Continual Learning with Bootstrapped Compression ICCV 2025

Multi-Scale Contrastive Learning for Video Temporal Grounding AAAI 2025

Multi-Granularity Video Object Segmentation AAAI 2025

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning ICCV 2025

Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-Learning AAAI 2025

Robust and Consistent Online Video Instance Segmentation via Instance Mask Propagation AAAI 2025

Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection AAAI 2025

Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding AAAI 2025

A Dataset for Programming-based Instructional Video Classification and Question Answering COLING 2025

Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration AAAI 2025

MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent ICCV 2025