2024 WACV WACV 2024

Refine and Redistribute: Multi-Domain Fusion and Dynamic Label Assignment for Unbiased Scene Graph Generation

Abstract

Scene Graph Generation (SGG) plays an important role in enhancing visual image comprehension. However, existing approaches often struggle to represent implicit relationship features, resulting in a limited ability to distinguish predicates. Meanwhile, they are vulnerable to skewed instance distributions, which impairs effective training for fine-grained predicates. To address these problems, we propose a novel feature refinement and data redistribution framework (RAR). Specifically, a multi-domain fusion (MDF) module is designed to acquire comprehensive predicate representations, integrating global knowledge from the contextual domain and local details in the spatial-frequency domains. Then, we introduce a dynamic label assignment (DLA) strategy to tackle the long-tailed problem. Different predicate categories are adaptively grouped, accommodating varying training conditions. Guided by this strategy, we leverage a hierarchical auto-encoder to generate siamese samples, expanding the label cardinality. Furthermore, we explore the updated sample space to derive reliable samples and assign tailored labels, ultimately achieving the data rebalancing. Experiments on VG and GQA demonstrate that our model contributes to correcting prediction bias and achieves a significant improvement of approximately 10% in mean recall compared to baseline models.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning
🧭 Keyword Pioneer — multi-domain fusion
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio