Grounding Semantic Roles in Images

Carina Silberer; Manfred Pinkal

2018 EMNLP EMNLP 2018

Grounding Semantic Roles in Images

Abstract

AbstractWe address the task of visual semantic role labeling (vSRL), the identification of the participants of a situation or event in a visual scene, and their labeling with their semantic relations to the event or situation. We render candidate participants as image regions of objects, and train a model which learns to ground roles in the regions which depict the corresponding participant. Experimental results demonstrate that we can train a vSRL model without reliance on prohibitive image-based role annotations, by utilizing noisy data which we extract automatically from image captions using a linguistic SRL system. Furthermore, our model induces frame—semantic visual representations, and their comparison to previous work on supervised visual verb sense disambiguation yields overall better results.

🌉 Interdisciplinary Bridge — Computer Vision and Interdisciplinary and Knowledge & Reasoning and Natural Language Processing

🧭 Keyword Pioneer — semantic role grounding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Carina Silberer , Manfred Pinkal

Topics

Computer Vision > Analysis > Scene Understanding Natural Language Processing > Understanding > Semantic Analysis Knowledge & Reasoning > Representation > Knowledge Representation Interdisciplinary > Linguistics > Semantics Computer Vision > Core AI > Multimodal Learning Computer Vision > Core AI > Computer Vision Natural Language Processing > Applications > Semantic Parsing

Keywords

event understanding semantic parsing multimodal learning frame semantics visual semantic role labeling image region verb sense disambiguation semantic role grounding frame-semantic visual representation participant identification linguistic srl

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018