Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

Temporal-aware Query Routing for Real-time Video Instance Segmentation ICCV 2025

Learning Beyond Still Frames: Scaling Vision-Language Models with Video ICCV 2025

Joint Self-Supervised Video Alignment and Action Segmentation ICCV 2025

RoMo: Robust Motion Segmentation Improves Structure from Motion ICCV 2025

MOVE: Motion-Guided Few-Shot Video Object Segmentation ICCV 2025

Multi-Modal Few-Shot Temporal Action Segmentation ICCV 2025

Snakes and Ladders: Two Steps Up for VideoMamba ICCV 2025

How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach ICCV 2025

MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent ICCV 2025

Efficient Motion-Aware Video MLLM CVPR 2025

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs CVPR 2025

ActionDiffusion: An Action-Aware Diffusion Model for Procedure Planning in Instructional Videos WACV 2025

OVG-HQ: Online Video Grounding with Hybrid-modal Queries ICCV 2025

Exploring Fine-Grained Human Motion Video Captioning COLING 2025

MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding ICCV 2025

Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering CVPR 2025

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding ICCV 2025

HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation CVPR 2025

KDA: Knowledge Diffusion Alignment with Enhanced Context for Video Temporal Grounding ICCV 2025

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models CVPR 2025

What's Making That Sound Right Now? Video-centric Audio-Visual Localization ICCV 2025

Flexible Frame Selection for Efficient Video Reasoning CVPR 2025

The Devil is in the Spurious Correlations: Boosting Moment Retrieval with Dynamic Learning ICCV 2025

Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos CVPR 2025

Diffusion-based 3D Hand Motion Recovery with Intuitive Physics ICCV 2025