Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better CVPR 2025

Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations CVPR 2025

RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations CVPR 2025

Online Generic Event Boundary Detection ICCV 2025

Predicting Implicit Arguments in Procedural Video Instructions ACL 2025

When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning CVPR 2025

MLVU: Benchmarking Multi-task Long Video Understanding CVPR 2025

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning ICCV 2025

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary CVPR 2025

TemCoCo: Temporally Consistent Multi-modal Video Fusion with Visual-Semantic Collaboration ICCV 2025

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction CVPR 2025

ViSpeak: Visual Instruction Feedback in Streaming Videos ICCV 2025

Multimodal Fusion and Coherence Modeling for Video Topic Segmentation ACL 2025

Revisiting Audio-Visual Segmentation with Vision-Centric Transformer CVPR 2025

KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception CVPR 2025

HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization CVPR 2025

TACO: Taming Diffusion for in-the-wild Video Amodal Completion ICCV 2025

Temporal Alignment-Free Video Matching for Few-shot Action Recognition CVPR 2025

Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs CVPR 2025

Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs CVPR 2025

Language Repository for Long Video Understanding ACL 2025

VILLS : Video-Image Learning to Learn Semantics for Person Re-Identification WACV 2025

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging CVPR 2025

Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning CVPR 2025

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding ICCV 2025