Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding CVPR 2025

Temporal-aware Query Routing for Real-time Video Instance Segmentation ICCV 2025

HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization CVPR 2025

Exploring Contextual Attribute Density in Referring Expression Counting CVPR 2025

Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes CVPR 2025

MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding ICCV 2025

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts CVPR 2025

VideoChain: A Transformer-Based Framework for Multi-hop Video Question Generation IJCNLP 2025

KDA: Knowledge Diffusion Alignment with Enhanced Context for Video Temporal Grounding ICCV 2025

GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images AAAI 2025

Efficient Motion-Aware Video MLLM CVPR 2025

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging CVPR 2025

Re-thinking Temporal Search for Long-Form Video Understanding CVPR 2025

TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos ICCV 2025

HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation CVPR 2025

Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval ICCV 2025

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs ICCV 2025

Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration AAAI 2025

DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation ICCV 2025

Moment Quantization for Video Temporal Grounding ICCV 2025

Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild ICCV 2025

Generic Event Boundary Detection via Denoising Diffusion ICCV 2025

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning ICCV 2025

TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision ICCV 2025

Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering CVPR 2025