Unsupervised Textual Grounding: Linking Words to Image Concepts

Raymond A. Yeh; Minh N. Do; Alexander G. Schwing

2018 CVPR CVPR 2018

Unsupervised Textual Grounding: Linking Words to Image Concepts

Abstract

Textual grounding, i.e., linking words to objects in images, is a challenging but important task for robotics and human-computer interaction. Existing techniques benefit from recent progress in deep learning and generally formulate the task as a supervised learning problem, selecting a bounding box from a set of possible options. To train these deep net based approaches, access to a large-scale datasets is required, however, constructing such a dataset is time-consuming and expensive. Therefore, we develop a completely unsupervised mechanism for textual grounding using hypothesis testing as a mechanism to link words to detected image concepts. We demonstrate our approach on the ReferIt Game dataset and the Flickr30k data, outperforming baselines by 7.98% and 6.96% respectively.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

📈 Trend Setter — Visual Question Answering

🧭 Keyword Pioneer — word to image linking

🐣 Hot Topic Early Bird — bounding box

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Raymond A. Yeh , Minh N. Do , Alexander G. Schwing

Topics

Machine Learning > Learning Types > Unsupervised Learning Computer Vision > Analysis > Object Detection Deep Learning > Learning Types > Unsupervised Learning Artificial Intelligence > Core AI > Information Extraction Computer Vision > Applications > Visual Question Answering

Keywords

unsupervised learning object detection hypothesis testing bounding box textual grounding word to image linking

Download PDF

Related papers

Multi-Shot Pedestrian Re-Identification via Sequential Decision Making 2018

Multi-Cue Correlation Filters for Robust Visual Tracking 2018

Pointwise Convolutional Neural Networks 2018

Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking 2018

Image Generation From Scene Graphs 2018