Computer Vision › Analysis ›

Video Understanding

1098 directly classified papers

Papers per year

Papers

Robust and Consistent Online Video Instance Segmentation via Instance Mask Propagation AAAI 2025

Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios CVPR 2025

SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation CVPR 2025

Deep Temporal Reasoning in Video Language Models: A Cross-Linguistic Evaluation of Action Duration and Completion through Perfect Times ACL 2025

VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos CVPR 2025

HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly ICCV 2025

VideoGEM: Training-free Action Grounding in Videos CVPR 2025

EgoNormia: Benchmarking Physical-Social Norm Understanding ACL 2025

Zero-Shot Scene Change Detection AAAI 2025

MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps CVPR 2025

Re-thinking Temporal Search for Long-Form Video Understanding CVPR 2025

DiffDVC: Accurate Event Detection for Dense Video Captioning via Diffusion Models AAAI 2025

ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries AAAI 2025

Multi-Edge Reinforced Collaborative Data Acquisition for Continuous Video Analytics by Prioritizing Quality over Quantity AAAI 2025

Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing AAAI 2025

AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification CVPR 2025

Sharper and Faster mean Better: Towards More Efficient Vision-Language Model for Hour-scale Long Video Understanding ACL 2025

Guess Future Anomalies from Normalcy: Forecasting Abnormal Behavior in Real-World Videos WACV 2025

EgoPoints: Advancing Point Tracking for Egocentric Videos WACV 2025

Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding NAACL 2025

Balancing Shared and Task-Specific Representations: A Hybrid Approach to Depth-Aware Video Panoptic Segmentation WACV 2025

Temporal Action Localization with Cross Layer Task Decoupling and Refinement AAAI 2025

Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection AAAI 2025

Learning to Visually Connect Actions and their Effects WACV 2025

M-LLM Based Video Frame Selection for Efficient Video Understanding CVPR 2025