Webly Supervised Knowledge Embedding Model for Visual Reasoning

Wenbo Zheng; Lan Yan; Chao Gou; Fei-Yue Wang

2020 CVPR CVPR 2020

Webly Supervised Knowledge Embedding Model for Visual Reasoning

Abstract

Visual reasoning between visual image and natural language description is a long-standing challenge in computer vision. While recent approaches offer a great promise by compositionality or relational computing, most of them are oppressed by the challenge of training with datasets containing only a limited number of images with ground-truth texts. Besides, it is extremely time-consuming and difficult to build a larger dataset by annotating millions of images with text descriptions that may very likely lead to a biased model. Inspired by the majority success of webly supervised learning, we utilize readily-available web images with its noisy annotations for learning a robust representation. Our key idea is to presume on web images and corresponding tags along with fully annotated datasets in learning with knowledge embedding. We present a two-stage approach for the task that can augment knowledge through an effective embedding model with weakly supervised web data. This approach learns not only knowledge-based embeddings derived from key-value memory networks to make joint and full use of textual and visual information but also exploits the knowledge to improve the performance with knowledge-based representation learning for applying other general reasoning tasks. Experimental results on two benchmarks show that the proposed approach significantly improves performance compared with the state-of-the-art methods and guarantees the robustness of our model against visual reasoning tasks and other reasoning tasks.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Knowledge & Reasoning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wenbo Zheng , Lan Yan , Chao Gou , Fei-Yue Wang

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Weakly Supervised Learning Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Scene Understanding Knowledge & Reasoning > Representation > Knowledge Graphs Machine Learning > Learning Paradigms > Self-Supervised Learning Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Analysis > Visual Question Answering

Keywords

representation learning weakly supervised learning multi-modal learning visual reasoning knowledge graph memory network knowledge embedding webly supervised learning

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020