Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Ivan Rodin; Antonino Furnari; Kyle Min; Subarna Tripathi; Giovanni Maria Farinella

2024 CVPR CVPR 2024

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Abstract

We present Egocentric Action Scene Graphs (EASGs) a new representation for long-form understanding of egocentric videos. EASGs extend standard manually-annotated representations of egocentric videos such as verb-noun action labels by providing a temporally evolving graph-based description of the actions performed by the camera wearer including interacted objects their relationships and how actions unfold in time. Through a novel annotation procedure we extend the Ego4D dataset adding manually labeled Egocentric Action Scene Graphs which offer a rich set of annotations for long-from egocentric video understanding. We hence define the EASG generation task and provide a baseline approach establishing preliminary benchmarks. Experiments on two downstream tasks action anticipation and activity summarization highlight the effectiveness of EASGs for long-form egocentric video understanding. We will release the dataset and code to replicate experiments and annotations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Knowledge & Reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ivan Rodin , Antonino Furnari , Kyle Min , Subarna Tripathi , Giovanni Maria Farinella

Topics

Computer Vision > Analysis > Action Recognition Computer Vision > Processing > Video Understanding Computer Vision > Domain-Specific > Egocentric Vision Knowledge & Reasoning > Representation > Knowledge Graphs Computer Vision > Analysis > Video Understanding Artificial Intelligence > Core AI > Knowledge Graph

Keywords

action recognition object detection egocentric vision video understanding scene graph graph representation egocentric video

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024