Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

MITracker: Multi-View Integration for Visual Object Tracking CVPR 2025

EntitySAM: Segment Everything in Video CVPR 2025

LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living CVPR 2025

VidSeg: Training-free Video Semantic Segmentation based on Diffusion Models CVPR 2025

SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning CVPR 2025

Aligning Moments in Time using Video Queries ICCV 2025

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation CVPR 2025

SADA: Semantic Adversarial Unsupervised Domain Adaptation for Temporal Action Localization WACV 2025

BVINet: Unlocking Blind Video Inpainting with Zero Annotations ICCV 2025

Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval ICCV 2025

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding CVPR 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video WACV 2025

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? CVPR 2025

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant CVPR 2025

Background-Aware Moment Detection for Video Moment Retrieval WACV 2025

Moment Quantization for Video Temporal Grounding ICCV 2025

KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception CVPR 2025

Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild ICCV 2025

Exploiting Frequency Dynamics for Enhanced Multimodal Event-based Action Recognition ICCV 2025

Generic Event Boundary Detection via Denoising Diffusion ICCV 2025

VideoChain: A Transformer-Based Framework for Multi-hop Video Question Generation IJCNLP 2025

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning ICCV 2025

Efficient Self-Supervised Video Hashing with Selective State Spaces AAAI 2025

TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision ICCV 2025

Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs CVPR 2025