Tell Me What You See and I will Show You Where It Is

Jia Xu; Alexander G. Schwing; Raquel Urtasun

2014 CVPR CVPR 2014

Tell Me What You See and I will Show You Where It Is

Abstract

We tackle the problem of weakly labeled semantic segmentation, where the only source of annotation are image tags encoding which classes are present in the scene. This is an extremely difficult problem as no pixel-wise labelings are available, not even at training time. In this paper, we show that this problem can be formalized as an instance of learning in a latent structured prediction framework, where the graphical model encodes the presence and absence of a class as well as the assignments of semantic labels to superpixels. As a consequence, we are able to leverage standard algorithms with good theoretical properties. We demonstrate the effectiveness of our approach using the challenging SIFT-flow dataset and show average per-class accuracy improvements of 7% over the state-of-the-art.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — latent structured prediction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jia Xu , Alexander G. Schwing , Raquel Urtasun

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Computer Vision > Processing > Semantic Segmentation Machine Learning > Core Methods > Structured Prediction

Keywords

semantic segmentation structured prediction weakly supervised learning graphical model weakly supervised latent structured prediction

Download PDF

Related papers

Efficient Nonlinear Markov Models for Human Motion 2014

Occlusion Geodesics for Online Multi-Object Tracking 2014

A Principled Approach for Coarse-to-Fine MAP Inference 2014

Locally Optimized Product Quantization for Approximate Nearest Neighbor Search 2014

Fast and Accurate Image Matching with Cascade Hashing for 3D Reconstruction 2014