Graphical Contrastive Losses for Scene Graph Parsing

Ji Zhang; Kevin J. Shih; Ahmed Elgammal; Andrew Tao; Bryan Catanzaro

2019 CVPR CVPR 2019

Graphical Contrastive Losses for Scene Graph Parsing

Abstract

Most scene graph parsers use a two-stage pipeline to detect visual relationships: the first stage detects entities, and the second predicts the predicate for each entity pair using a softmax distribution. We find that such pipelines, trained with only a cross entropy loss over predicate classes, suffer from two common errors. The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e.g. multiple cups). The second, Proximal Relationship Ambiguity, arises when multiple subject-predicate-object triplets appear in close proximity with the same predicate, and the model struggles to infer the correct subject-object pairings (e.g. mis-pairing musicians and their instruments). We propose a set of contrastive loss formulations that specifically target these types of errors within the scene graph parsing problem, collectively termed the Graphical Contrastive Losses. These losses explicitly force the model to disambiguate related and unrelated instances through margin constraints specific to each type of confusion. We further construct a relationship detector, called RelDN, using the aforementioned pipeline to demonstrate the efficacy of our proposed losses. Our model outperforms the winning method of the OpenImages Relationship Detection Challenge by 4.7% (16.5% relatively) on the test set. We also show improved results over the best previous methods on the Visual Genome and Visual Relationship Detection datasets.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

📈 Trend Setter — Contrastive Learning

🧭 Keyword Pioneer — margin constraint

🐣 Hot Topic Early Bird — contrastive loss

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ji Zhang , Kevin J. Shih , Ahmed Elgammal , Andrew Tao , Bryan Catanzaro

Topics

Machine Learning > Learning Types > Contrastive Learning Computer Vision > Analysis > Object Detection Computer Vision > Analysis > Scene Understanding Deep Learning > Learning Types > Contrastive Learning

Keywords

contrastive learning object detection contrastive loss entity recognition entity detection visual relationship scene graph parsing visual relationship detection predicate prediction margin constraint relationship detection

Download PDF

Related papers

Fast Single Image Reflection Suppression via Convex Optimization 2019

Learning Video Representations From Correspondence Proposals 2019

ATOM: Accurate Tracking by Overlap Maximization 2019

Visual Tracking via Adaptive Spatially-Regularized Correlation Filters 2019

Edge-Labeling Graph Neural Network for Few-Shot Learning 2019