OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

Guan Wang; Zhimin Li; Qingchao Chen; Yang Liu

2024 CVPR CVPR 2024

OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

Abstract

Dynamic Scene Graph Generation (DSGG) focuses on identifying visual relationships within the spatial-temporal domain of videos. Conventional approaches often employ multi-stage pipelines which typically consist of object detection temporal association and multi-relation classification. However these methods exhibit inherent limitations due to the separation of multiple stages and independent optimization of these sub-problems may yield sub-optimal solutions. To remedy these limitations we propose a one-stage end-to-end framework termed OED which streamlines the DSGG pipeline. This framework reformulates the task as a set prediction problem and leverages pair-wise features to represent each subject-object pair within the scene graph. Moreover another challenge of DSGG is capturing temporal dependencies we introduce a Progressively Refined Module (PRM) for aggregating temporal context without the constraints of additional trackers or handcrafted trajectories enabling end-to-end optimization of the network. Extensive experiments conducted on the Action Genome benchmark demonstrate the effectiveness of our design. The code and models are available at https://github.com/guanw-pku/OED.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Guan Wang , Zhimin Li , Qingchao Chen , Yang Liu

Topics

Computer Vision > Analysis > Object Detection Computer Vision > Analysis > Scene Understanding Computer Vision > Processing > Video Understanding

Keywords

temporal modeling video understanding scene graph generation one-stage end-to-end dynamic scene graph

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024