Computer Vision › Analysis ›

Video Understanding

1098 directly classified papers

Papers per year

Papers

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? CVPR 2025

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding CVPR 2025

Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning CVPR 2025

Graph-Jigsaw Conditioned Diffusion Model for Skeleton-Based Video Anomaly Detection WACV 2025

ActionDiffusion: An Action-Aware Diffusion Model for Procedure Planning in Instructional Videos WACV 2025

MSR2: A Benchmark for Multi-Source Retrieval and Reasoning in Visual Question Answering NAACL 2025

Efficient Motion-Aware Video MLLM CVPR 2025

Generic Event Boundary Detection via Denoising Diffusion ICCV 2025

Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos CVPR 2025

AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing CVPR 2025

SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning CVPR 2025

MLVU: Benchmarking Multi-task Long Video Understanding CVPR 2025

KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception CVPR 2025

Deep Temporal Reasoning in Video Language Models: A Cross-Linguistic Evaluation of Action Duration and Completion through Perfect Times ACL 2025

Anomize: Better Open Vocabulary Video Anomaly Detection CVPR 2025

EgoNormia: Benchmarking Physical-Social Norm Understanding ACL 2025

HuPerFlow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison CVPR 2025

CASP: Consistency-aware Audio-induced Saliency Prediction Model for Omnidirectional Video CVPR 2025

FIction: 4D Future Interaction Prediction from Video CVPR 2025

Joint Self-Supervised Video Alignment and Action Segmentation ICCV 2025

Reliable and Diverse Hierarchical Adapter for Zero-shot Video Classification IJCAI 2025

Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection ICCV 2025

How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach ICCV 2025

VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos ICCV 2025

HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly ICCV 2025