Exploring Self-Attention for Image Recognition

Hengshuang Zhao; Jiaya Jia; Vladlen Koltun

2020 CVPR CVPR 2020

Exploring Self-Attention for Image Recognition

Abstract

Recent work has shown that self-attention can serve as a basic building block for image recognition models. We explore variations of self-attention and assess their effectiveness for image recognition. We consider two forms of self-attention. One is pairwise self-attention, which generalizes standard dot-product attention and is fundamentally a set operator. The other is patchwise self-attention, which is strictly more powerful than convolution. Our pairwise self-attention networks match or outperform their convolutional counterparts, and the patchwise models substantially outperform the convolutional baselines. We also conduct experiments that probe the robustness of learned representations and conclude that self-attention networks may have significant benefits in terms of robustness and generalization.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — vision transformer

🐣 Hot Topic Early Bird — vision transformer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hengshuang Zhao , Jiaya Jia , Vladlen Koltun

Topics

Deep Learning > Architectures > Transformers Deep Learning > Architectures > Neural Networks Computer Vision > Core AI > Computer Vision Artificial Intelligence > Core AI > Attention

Keywords

vision transformer computer vision image recognition convolutional neural network pairwise attention patchwise attention pairwise self-attention patchwise self-attention

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020