Computer Vision › Analysis ›

Video Understanding

1098 directly classified papers

Papers per year

Papers

HVGuard: Utilizing Multimodal Large Language Models for Hateful Video Detection EMNLP 2025

MVAD: A Multiple Visual Artifact Detector for Video Streaming WACV 2025

Static or Dynamic: Towards Query-Adaptive Token Selection for Video Question Answering EMNLP 2025

Anomize: Better Open Vocabulary Video Anomaly Detection CVPR 2025

Transparent and Coherent Procedural Mistake Detection EMNLP 2025

Towards Safer and Understandable Driver Intention Prediction ICCV 2025

ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning EMNLP 2025

PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution CVPR 2025

Investigating Dictionary Expansion for Video-based Sign Language Dictionaries EMNLP 2025

EVOLVE: Event-Guided Deformable Feature Transfer and Dual-Memory Refinement for Low-Light Video Object Segmentation ICCV 2025

KDA: Knowledge Diffusion Alignment with Enhanced Context for Video Temporal Grounding ICCV 2025

Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning CVPR 2025

DiffDVC: Accurate Event Detection for Dense Video Captioning via Diffusion Models AAAI 2025

Zero-Shot Scene Change Detection AAAI 2025

HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization CVPR 2025

Similar Modality Enhancement and Action Consistency Learning for Weakly Supervised Temporal Action Localization AAAI 2025

VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos AAAI 2025

Interacted Object Grounding in Spatio-Temporal Human-Object Interactions AAAI 2025

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? CVPR 2025

Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection WACV 2025

Hierarchical Vector Quantization for Unsupervised Action Segmentation AAAI 2025

Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding AAAI 2025

HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation CVPR 2025

Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection AAAI 2025

Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation ACL 2025