Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising AAAI 2025

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models CVPR 2025

Unsupervised Video Highlight Detection by Learning from Audio and Visual Recurrence WACV 2025

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation CVPR 2025

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers ICCV 2025

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents ICCV 2025

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding ICCV 2025

Language Repository for Long Video Understanding ACL 2025

Streaming VideoLLMs for Real-Time Procedural Video Understanding ICCV 2025

HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation CVPR 2025

VisTRA: Visual Tool-use Reasoning Analyzer for Small Object Visual Question Answering ACL 2025

TACO: Taming Diffusion for in-the-wild Video Amodal Completion ICCV 2025

Online Generic Event Boundary Detection ICCV 2025

Efficient Motion-Aware Video MLLM CVPR 2025

Multimodal Fusion and Coherence Modeling for Video Topic Segmentation ACL 2025

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance ICCV 2025

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction CVPR 2025

MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models ACL 2025

Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering CVPR 2025

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory ICCV 2025

BVINet: Unlocking Blind Video Inpainting with Zero Annotations ICCV 2025

HERO: Human Reaction Generation from Videos ICCV 2025

Diffusion-based 3D Hand Motion Recovery with Intuitive Physics ICCV 2025

Predicting Implicit Arguments in Procedural Video Instructions ACL 2025

MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning ICCV 2025