HumMUSS: Human Motion Understanding using State Space Models

Arnab Mondal; Stefano Alletto; Denis Tome

2024 CVPR CVPR 2024

HumMUSS: Human Motion Understanding using State Space Models

Abstract

Understanding human motion from video is essential for a range of applications including pose estimation mesh recovery and action recognition. While state-of-the-art methods predominantly rely on transformer-based architectures these approaches have limitations in practical scenarios. Transformers are slower when sequentially predicting on a continuous stream of frames in real-time and do not generalize to new frame rates. In light of these constraints we propose a novel attention-free spatiotemporal model for human motion understanding building upon recent advancements in state space models. Our model not only matches the performance of transformer-based models in various motion understanding tasks but also brings added benefits like adaptability to different video frame rates and enhanced training speed when working with longer sequence of keypoints. Moreover the proposed model supports both offline and real-time applications. For real-time sequential prediction our model is both memory efficient and several times faster than transformer-based approaches while maintaining their high accuracy.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — human motion understanding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Arnab Mondal , Stefano Alletto , Denis Tome

Topics

Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Action Recognition Computer Vision > Analysis > Human Pose Estimation

Keywords

action recognition pose estimation video understanding human motion state space model spatiotemporal model mesh recovery human motion understanding

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024