Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs (Student Abstract)
Abstract
Abstract Extending LLM context windows is key for long-range tasks. RoPE-based position interpolation (PI) scales input length without retraining, and post-training quantization (PTQ) enables efficient deployment; however, combining PI with PTQ degrades accuracy due to long-context aliasing, dynamic-range dilation, axis-grid anisotropy, and outlier shifts that induce position-dependent logit noise. We give the first systematic analysis of PI+PTQ and propose two diagnostics: Interpolation Pressure (per-band phase-scaling sensitivity) and Tail Inflation Ratio (outlier shift from short to long contexts). We then introduce Q-ROAR, a RoPE-aware, weight-only stabilization that bands RoPE dimensions and lightly searches per-band scales for W_Q,W_K, with an optional symmetric variant. Q-ROAR needs only a tiny long-context dev set and no fine-tuning or kernel changes, recovering up to 0.7% accuracy and more than 14% GovReport perplexity reduction while preserving short-context performance.