SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

Xinyu Shi; Zecheng Hao; Zhaofei Yu

2024 CVPR CVPR 2024

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

Abstract

The remarkable success of Vision Transformers in Artificial Neural Networks (ANNs) has led to a growing interest in incorporating the self-attention mechanism and transformer-based architecture into Spiking Neural Networks (SNNs). While existing methods propose spiking self-attention mechanisms that are compatible with SNNs they lack reasonable scaling methods and the overall architectures proposed by these methods suffer from a bottleneck in effectively extracting local features. To address these challenges we propose a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) with a reasonable scaling method. Based on DSSA we propose a novel spiking Vision Transformer architecture called SpikingResformer which combines the ResNet-based multi-stage architecture with our proposed DSSA to improve both performance and energy efficiency while reducing parameters. Experimental results show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption than other spiking Vision Transformer counterparts. Notably our SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps which is the state-of-the-art result in the SNN field.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xinyu Shi , Zecheng Hao , Zhaofei Yu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Architectures > Transformers Deep Learning > Architectures > Neural Networks

Keywords

image classification vision transformer self-attention mechanism energy efficiency spiking neural network

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024