Excitation Backprop for RNNs

Sarah Adel Bargal; Andrea Zunino; Donghyun Kim; Jianming Zhang; Vittorio Murino; Stan Sclaroff

2018 CVPR CVPR 2018

Excitation Backprop for RNNs

Abstract

Deep models are state-of-the-art or many vision tasks including video action recognition and video captioning. Models are trained to caption or classify activity in videos, but little is known about the evidence used to make such decisions. Grounding decisions made by deep networks has been studied in spatial visual content, giving more insight into model predictions for images. However, such studies are relatively lacking for models of spatiotemporal visual content - videos. In this work, we devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep model's classification/captioning output using the model's internal representation. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

📈 Trend Setter — Interpretability

🧭 Keyword Pioneer — excitation backprop

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sarah Adel Bargal , Andrea Zunino , Donghyun Kim , Jianming Zhang , Vittorio Murino , Stan Sclaroff

Topics

Artificial Intelligence > Core AI > Interpretability Computer Vision > Processing > Video Understanding Computer Vision > Analysis > Video Understanding Deep Learning > Techniques > Attention Computer Vision > Core AI > Interpretability

Keywords

video understanding visual grounding recurrent neural network top-down attention spatiotemporal saliency excitation backprop saliency visualization

Download PDF

Related papers

Multi-Shot Pedestrian Re-Identification via Sequential Decision Making 2018

Multi-Cue Correlation Filters for Robust Visual Tracking 2018

Pointwise Convolutional Neural Networks 2018

Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking 2018

Image Generation From Scene Graphs 2018