PhraseCut: Language-Based Image Segmentation in the Wild

Chenyun Wu; Zhe Lin; Scott Cohen; Trung Bui; Subhransu Maji

2020 CVPR CVPR 2020

PhraseCut: Language-Based Image Segmentation in the Wild

Abstract

We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77,262 images and 345,486 phrase-region pairs. Our dataset is collected on top of the Visual Genome dataset and uses the existing annotations to generate a challenging set of referring phrases for which the corresponding regions are manually annotated. Phrases in our dataset correspond to multiple regions and describe a large number of object and stuff categories as well as their attributes such as color, shape, parts, and relationships with other entities in the image. Our experiments show that the scale and diversity of concepts in our dataset poses significant challenges to the existing state-of-the-art. We systematically handle the long-tail nature of these concepts and present a modular approach to combine category, attribute, and relationship cues that outperforms existing approaches.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — visual genome

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chenyun Wu , Zhe Lin , Scott Cohen , Trung Bui , Subhransu Maji

Topics

Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Generation > Image Captioning Computer Vision > Processing > Image Segmentation Computer Vision > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Computer Vision Natural Language Processing > Applications > Visual Question Answering Deep Learning > Learning Types > Multi-Modal Learning

Keywords

semantic segmentation image segmentation natural language processing multimodal learning referring expression visual grounding visual genome language-based segmentation phrase-region matching

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020