SWAG-V: Explanations for Video Using Superpixels Weighted by Average Gradients

Thomas Hartley; Kirill Sidorov; Christopher Willis; David Marshall

2022 WACV WACV 2022

SWAG-V: Explanations for Video Using Superpixels Weighted by Average Gradients

Abstract

CNN architectures that take videos as an input are often overlooked when it comes to the development of explanation techniques. This is despite their use in critical domains such as surveillance and healthcare. Explanation techniques developed for these networks must take into account the additional temporal domain if they are to be successful. In this paper we introduce SWAG-V, an extension of SWAG for use with networks that take video as an input. By creating superpixels that incorporate individual frames of the input video we are able to create explanations that better locate regions of the input that are important to the networks prediction. We demonstrate using Kinetics-400 with both the C3D and R(2+1)D network architectures that SWAG-V outperforms Grad-CAM, Grad-CAM++ and Saliency Tubes over a range of common metrics such as explanation accuracy and localisation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Thomas Hartley , Kirill Sidorov , Christopher Willis , David Marshall

Topics

Artificial Intelligence > Core AI > Interpretability Deep Learning > Architectures > Neural Networks

Keywords

video classification gradient-based method convolutional neural network

Download PDF

Related papers

A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic Segmentation 2022

Unsupervised Sounding Object Localization With Bottom-Up and Top-Down Attention 2022

Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation 2022

Deep Photo Scan: Semi-Supervised Learning for Dealing With the Real-World Degradation in Smartphone Photo Scanning 2022

Let There Be a Clock on the Beach: Reducing Object Hallucination in Image Captioning 2022