Deformable Siamese Attention Networks for Visual Object Tracking

Yuechen Yu; Yilei Xiong; Weilin Huang; Matthew R. Scott

2020 CVPR CVPR 2020

Deformable Siamese Attention Networks for Visual Object Tracking

Abstract

Siamese-based trackers have achieved excellent performance on visual object tracking. However, the target template is not updated online, and the features of target template and search image are computed independently in a Siamese architecture. In this paper, we propose Deformable Siamese Attention Networks, referred to as SiamAttn, by introducing a new Siamese attention mechanism that computes deformable self-attention and cross-attention. The self-attention learns strong context information via spatial attention, and selectively emphasizes interdependent channel-wise features with channel attention. The crossattention is capable of aggregating rich contextual interdependencies between the target template and the search image, providing an implicit manner to adaptively update the target template. In addition, we design a region refinement module that computes depth-wise cross correlations between the attentional features for more accurate tracking. We conduct experiments on six benchmarks, where our method achieves new state-of-the-art results, outperforming recent strong baseline, SiamRPN++, by 0.464 to 0.537 and 0.415 to 0.470 EAO on VOT 2016 and 2018.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — deformable attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuechen Yu , Yilei Xiong , Weilin Huang , Matthew R. Scott

Topics

Deep Learning > Architectures > Transformers Deep Learning > Techniques > Model Architecture Computer Vision > Analysis > Object Tracking Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Attention

Keywords

deformable convolution attention mechanism visual object tracking cross attention deformable attention siamese network cross correlation

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020