2026 WACV WACV 2026

IPTQ-ViT: Post-Training Quantization of Non-linear Functions for Integer-only Vision Transformers

Abstract

Quantization-Aware Training (QAT) for vision transformers relies on expensive retraining to recover accuracy loss in non-linear layer quantization, which limits their use in resource-constrained environments. In contrast, existing Post-Training Quantization (PTQ) methods either partially quantize non-linear functions or adjust activation distributions to maintain accuracy but fail to achieve fully integer-only inference. In this paper, we introduce IPTQ-ViT, a PTQ framework for fully integer-only vision transformers without retraining. We present approximation functions: a polynomial-based GELU optimized for vision data and a bit-shifting-based Softmax designed to improve approximation accuracy in PTQ. In addition, we propose a unified metric integrating quantization sensitivity, perturbation, and computational complexity to select the optimal approximation function per activation layer. IPTQ-ViT consistently outperforms previous PTQ methods, achieving up to 6.44%p (avg. 1.78%p) top-1 accuracy improvement for image classification and 1.0 mAP for object detection under W8A8 and W4A8. IPTQ-ViT achieves accuracy and latency comparable to integer-only QAT methods.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — gelu approximation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio