Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

VILLS : Video-Image Learning to Learn Semantics for Person Re-Identification WACV 2025

Event-Guided Low-Light Video Semantic Segmentation WACV 2025

Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising AAAI 2025

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding CVPR 2025

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM CVPR 2025

MLVU: Benchmarking Multi-task Long Video Understanding CVPR 2025

LoSA: Long-Short-Range Adapter for Scaling End-to-End Temporal Action Localization WACV 2025

Temporally Grounding Instructional Diagrams in Unconstrained Videos WACV 2025

GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images AAAI 2025

A Video-grounded Dialogue Dataset and Metric for Event-driven Activities AAAI 2025

Image-to-video Adaptation with Outlier Modeling and Robust Self-learning AAAI 2025

Federated Weakly Supervised Video Anomaly Detection with Multimodal Prompt AAAI 2025

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts CVPR 2025

Exploring Fine-Grained Human Motion Video Captioning COLING 2025

Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration AAAI 2025

ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context AAAI 2025

When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning CVPR 2025

Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-Learning AAAI 2025

Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection AAAI 2025

Revisiting Audio-Visual Segmentation with Vision-Centric Transformer CVPR 2025

FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding WACV 2025

Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning AAAI 2025

ALLVB: All-in-One Long Video Understanding Benchmark AAAI 2025

Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities CVPR 2025

Paladin: Understanding Video Intentions in Political Advertisement Videos WACV 2025