QuantAttack: Exploiting Quantization Techniques to Attack Vision Transformers
Abstract
In recent years there has been a significant trend in deep neural networks (DNNs) particularly transformer-based models of developing ever-larger and more capable models. While they demonstrate state-of-the-art performance their growing scale requires increased computational resources (e.g. GPUs with greater memory capacity). To address this problem quantization techniques (i.e. low-bit-precision representation and matrix multiplication) have been proposed. Most quantization techniques employ a static strategy in which the model parameters are quantized either during training or inference without considering the test-time sample. In contrast dynamic quantization techniques which have become increasingly popular adapt during inference based on the input provided while maintaining full-precision performance. However their dynamic behavior and average-case performance assumption makes them vulnerable to a novel threat vector - adversarial attacks that target the model's efficiency and availability. In this paper we present QuantAttack a novel attack that targets the availability of quantized vision transformers slowing down the inference and increasing memory usage and energy consumption. The source code is available online.