Transferable-Guided Attention is All You Need for Video Domain Adaptation

André Sacilotti; Samuel Felipe dos Santos; Nicu Sebe; Jurandy Almeida

2025 WACV WACV 2025

Transferable-Guided Attention is All You Need for Video Domain Adaptation

Abstract

Unsupervised domain adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques. Although vision transformers (ViT) achieve state-of-the-art performance in many computer vision tasks their use in video UDA has been little explored. Our key idea is to use transformer layers as a feature encoder and incorporate spatial and temporal transferability relationships into the attention mechanism. A Transferable-guided Attention (TransferAttn) framework is then developed to exploit the capacity of the transformer to adapt cross-domain knowledge across different backbones. To improve the transferability of ViT we introduce a novel and effective module named Domain Transferable-guided Attention Block (DTAB). DTAB compels ViT to focus on the spatio-temporal transferability relationship among video frames by changing the self-attention mechanism to a transferability attention mechanism. Extensive experiments were conducted on UCF-HMDB Kinetics-Gameplay and Kinetics-NEC Drone datasets with different backbones like ResNet101 I3D and STAM to verify the effectiveness of TransferAttn compared with state-of-the-art approaches. Also we demonstrate that DTAB yields performance gains when applied to other state-of-the-art transformer-based UDA methods from both video and image domains. Our code is available at https://github.com/Andre-Sacilotti/transferattn-project-code.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

André Sacilotti , Samuel Felipe dos Santos , Nicu Sebe , Jurandy Almeida

Topics

Machine Learning > Application Areas > Domain Adaptation Deep Learning > Architectures > Transformers Computer Vision > Analysis > Video Understanding Artificial Intelligence > Core AI > Computer Vision

Keywords

vision transformer domain adaptation attention mechanism video understanding unsupervised domain adaptation feature encoder

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025