CoL2A: Convolution-free Local Linear Attention for SpatioTemporal Event Processing
Abstract
Linear attention is sparse, recurrent, and GPU-parallel; these are essential features for processing sparse data from event-based cameras. We argue that locality is missing to efficiently model event-to-event relationships for continuous spatiotemporal perception. We propose CoL^2\mkern-3muA by introducing locality into linear attention without using a computationally demanding convolution operation. The key idea for the convolution-free formulation is restricting the positional embedding local convolutional kernel into the special class which can be decomposed into two global positional embeddings which can be absorbed into query and key; this replaces convolution with a local sum. To the best of our knowledge, CoL^2\mkern-3muAis the first to equip sparsity, recurrence, GPU parallelism and locality, simultaneously. We demonstrate CoL^2\mkern-3muA's effectiveness on dense, high-temporal-resolution (> 1000 fps ) prediction task from events, demonstrating real-time capability while maintaining competitive results over the conventional method.