3DInAction: Understanding Human Actions in 3D Point Clouds

Yizhak Ben-Shabat; Oren Shrout; Stephen Gould

2024 CVPR CVPR 2024

3DInAction: Understanding Human Actions in 3D Point Clouds

Abstract

We propose a novel method for 3D point cloud action recognition. Understanding human actions in RGB videos has been widely studied in recent years however its 3D point cloud counterpart remains under-explored despite the clear value that 3D information may bring. This is mostly due to the inherent limitation of the point cloud data modality---lack of structure permutation invariance and varying number of points---which makes it difficult to learn a spatio-temporal representation. To address this limitation we propose the 3DinAction pipeline that first estimates patches moving in time (t-patches) as a key building block alongside a hierarchical architecture that learns an informative spatio-temporal representation. We show that our method achieves improved performance on existing datasets including DFAUST and IKEA ASM. Code is publicly available at https://github.com/sitzikbs/3dincaction

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yizhak Ben-Shabat , Oren Shrout , Stephen Gould

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Action Recognition Deep Learning > Learning Types > Self-Supervised Learning Computer Vision > Domain-Specific > 3D Vision

Keywords

action recognition point cloud point cloud processing 3d point cloud hierarchical architecture spatio-temporal representation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024