Commonly Uncommon: Semantic Sparsity in Situation Recognition

Mark Yatskar; Vicente Ordonez; Luke Zettlemoyer; Ali Farhadi

2017 CVPR CVPR 2017

Commonly Uncommon: Semantic Sparsity in Situation Recognition

Abstract

Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set. This paper studies semantic sparsity in situation recognition, the task of producing structured summaries of what is happening in images, including activities, objects and the roles objects play within the activity. For this problem, we find empirically that most substructures required for prediction are rare, and current state-of-the-art model performance dramatically decreases if even one such rare substructure exists in the target output.We avoid many such errors by (1) introducing a novel tensor composition function that learns to share examples across substructures more effectively and (2) se- mantically augmenting our training data with automatically gathered examples of rarely observed outputs using web data. When integrated within a complete CRF-based structured prediction model, the tensor-based approach outperforms existing state of the art by a relative improvement of 2.11% and 4.40% on top-5 verb and noun-role accuracy, respectively. Adding 5 million images with our semantic aug- mentation techniques gives further relative improvements of 6.23% and 9.57% on top-5 verb and noun-role accuracy.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — semantic sparsity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mark Yatskar , Vicente Ordonez , Luke Zettlemoyer , Ali Farhadi

Topics

Machine Learning > Application Areas > Data Augmentation Computer Vision > Analysis > Scene Understanding Deep Learning > Learning Types > Classification Machine Learning > Core Methods > Structured Prediction Computer Vision > Analysis > Computer Vision

Keywords

transfer learning structured prediction data augmentation tensor decomposition visual classification conditional random field semantic sparsity situation recognition tensor composition

Download PDF

Related papers

Deep Outdoor Illumination Estimation 2017

SRN: Side-output Residual Network for Object Symmetry Detection in the Wild 2017

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos 2017

FASON: First and Second Order Information Fusion Network for Texture Recognition 2017

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization 2017