Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)

Ze Fu; Junhao Feng; Changmeng Zheng; Yi Cai

2022 AAAI AAAI 2022

Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)

Abstract

Abstract Existing scene graph generation methods suffer the limitations when the image lacks of sufficient visual contexts. To address this limitation, we propose a knowledge-enhanced scene graph generation model with multimodal relation alignment, which supplements the missing visual contexts by well-aligned textual knowledge. First, we represent the textual information into contextualized knowledge which is guided by the visual objects to enhance the contexts. Furthermore, we align the multimodal relation triplets by co-attention module for better semantics fusion. The experimental results show the effectiveness of our method.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning

🧭 Keyword Pioneer — multimodal relation alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ze Fu , Junhao Feng , Changmeng Zheng , Yi Cai

Topics

Machine Learning > Core Methods > Representation Learning Computer Vision > Analysis > Scene Understanding Computer Vision > Generation > Image Generation Computer Vision > Core AI > Multimodal Learning Machine Learning > Learning Types > Multi-Modal Learning Artificial Intelligence > Core AI > Knowledge Graph Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

visual question answering attention mechanism multimodal learning scene graph generation visual context knowledge graph knowledge enhancement semantic fusion multimodal relation alignment textual knowledge enhancement visual context supplement co-attention module

Download PDF

Related papers

Dynamic Spatial Propagation Network for Depth Completion 2022

FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition 2022

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding 2022

AnchorFace: Boosting TAR@FAR for Practical Face Recognition 2022

Parallel and High-Fidelity Text-to-Lip Generation 2022