Computer Vision › Analysis ›

Action Recognition

1421 directly classified papers

Papers per year

Papers

Text-Guided Nonverbal Enhancement Based on Modality-Invariant and -Specific Representations for Video Speaking Style Recognition AAAI 2025

Towards Efficient General Feature Prediction in Masked Skeleton Modeling ICCV 2025

H-MoRe: Learning Human-centric Motion Representation for Action Analysis CVPR 2025

Background-Aware Moment Detection for Video Moment Retrieval WACV 2025

Sign Language Recognition: A Large-Scale Multi-View Dataset and Comprehensive Evaluation WACV 2025

HUMOTO: A 4D Dataset of Mocap Human Object Interactions ICCV 2025

ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users ICCV 2025

SV-data2vec: Guiding Video Representation Learning with Latent Skeleton Targets WACV 2025

Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning CVPR 2025

Snakes and Ladders: Two Steps Up for VideoMamba ICCV 2025

Click&Describe: Multimodal Grounding and Tracking for Aerial Objects WACV 2025

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement CVPR 2025

ActionDiffusion: An Action-Aware Diffusion Model for Procedure Planning in Instructional Videos WACV 2025

D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching AAAI 2025

HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models ACL 2025

Joint Self-Supervised Video Alignment and Action Segmentation ICCV 2025

CLOT: Closed Loop Optimal Transport for Unsupervised Action Segmentation ICCV 2025

Learning to Visually Connect Actions and their Effects WACV 2025

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation CVPR 2025

Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models CVPR 2025

SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction ACL 2025

TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation CVPR 2025

DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer CVPR 2025

MistSense: Versatile Online Detection of Procedural and Execution Mistakes ICCV 2025

Skeleton Motion Words for Unsupervised Skeleton-Based Temporal Action Segmentation ICCV 2025