FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding

Jinglin Xu; Guohao Zhao; Sibo Yin; Wenhao Zhou; Yuxin Peng

2024 CVPR CVPR 2024

FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding

Abstract

Fine-grained action analysis in multi-person sports is complex due to athletes' quick movements and intense physical confrontations which result in severe visual obstructions in most scenes. In addition accessible multi-person sports video datasets lack fine-grained action annotations in both space and time adding to the difficulty in fine-grained action analysis. To this end we construct a new multi-person basketball sports video dataset named FineSports which contains fine-grained semantic and spatial-temporal annotations on 10000 NBA game videos covering 52 fine-grained action types 16000 action instances and 123000 spatial-temporal bounding boxes. We also propose a new prompt-driven spatial-temporal action location approach called PoSTAL composed of a prompt-driven target action encoder (PTA) and an action tube-specific detector (ATD) to directly generate target action tubes with fine-grained action types without any off-line proposal generation. Extensive experiments on the FineSports dataset demonstrate that PoSTAL outperforms state-of-the-art methods. Data and code are available at https://github.com/PKU-ICST-MIPL/FineSports_CVPR2024.

🧭 Keyword Pioneer — spatial-temporal annotation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jinglin Xu , Guohao Zhao , Sibo Yin , Wenhao Zhou , Yuxin Peng

Topics

Computer Vision > Analysis > Action Recognition Computer Vision > Processing > Video Understanding Computer Vision > Analysis > Video Understanding

Keywords

action recognition action localization fine-grained action multi-person video fine-grained action recognition sports video spatial-temporal annotation spatial-temporal localization prompt-driven detection

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024