A Sentence Is Worth a Thousand Pixels

Sanja Fidler; Abhishek Sharma; Raquel Urtasun

2013 CVPR CVPR 2013

A Sentence Is Worth a Thousand Pixels

Abstract

We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models [8].

🚀 Conference Pioneer — CVPR 2013

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

📈 Trend Setter — Multi-Modal Learning

🧭 Keyword Pioneer — visual language

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sanja Fidler , Abhishek Sharma , Raquel Urtasun

Topics

Machine Learning > Core Methods > Classification Computer Vision > Analysis > Scene Understanding Computer Vision > Processing > Semantic Segmentation Artificial Intelligence > Core AI > Computer Vision Deep Learning > Learning Types > Multi-Modal Learning

Keywords

semantic segmentation scene understanding object detection conditional random field visual language holistic scene understanding holistic parsing

Download PDF

Related papers

Nonlinearly Constrained MRFs: Exploring the Intrinsic Dimensions of Higher-Order Cliques 2013

An Approach to Pose-Based Action Recognition 2013

Modeling Actions through State Changes 2013

A Convex Regularizer for Reducing Color Artifact in Color Image Recovery 2013

Deformable Spatial Pyramid Matching for Fast Dense Correspondences 2013