2022 WACV WACV 2022

SWAG-V: Explanations for Video Using Superpixels Weighted by Average Gradients

Abstract

CNN architectures that take videos as an input are often overlooked when it comes to the development of explanation techniques. This is despite their use in critical domains such as surveillance and healthcare. Explanation techniques developed for these networks must take into account the additional temporal domain if they are to be successful. In this paper we introduce SWAG-V, an extension of SWAG for use with networks that take video as an input. By creating superpixels that incorporate individual frames of the input video we are able to create explanations that better locate regions of the input that are important to the networks prediction. We demonstrate using Kinetics-400 with both the C3D and R(2+1)D network architectures that SWAG-V outperforms Grad-CAM, Grad-CAM++ and Saliency Tubes over a range of common metrics such as explanation accuracy and localisation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio