Mix-QSAM2: Mixed-Precision Quantization for High Fidelity Segmentation in Resource Constrained Scenarios

Yuzhe Duan; Xuanxuan Ren; Guizhe Dong; Xu Yang; Yanhua Yang

2026 AAAI AAAI 2026

Mix-QSAM2: Mixed-Precision Quantization for High Fidelity Segmentation in Resource Constrained Scenarios

Abstract

Abstract The Segment Anything Model 2 (SAM2) has established a new benchmark for high-precision image and video segmentation, offering significant potential for a wide range of computer vision tasks. Despite its impressive performance, the model's substantial computational and memory requirements present a significant obstacle to its practical deployment on resource-constrained devices. In this paper, we introduce a novel framework for optimizing SAM2 through two synergistic, importance-driven strategies: quantization and memory management. Specifically, an Importance-driven Mixed-Precision Quantization scheme, which analyzes the sensitivity of each layer using a Weight-Activation Importance Score, is employed to enable a targeted bit-width assignment, preserving model accuracy by keeping critical layers at higher precision. Then, the Selective Importance-driven Synthesis (SIS) mechanism is proposed to address the inefficient accumulation of redundant data in the memory bank. SIS intelligently compresses the memory by identifying the most contextually similar historical frames and synthesizing them into a single, representative feature, thereby preserving informational diversity while enhancing temporal context understanding. Extensive experiments on the COCO and SA-V benchmarks validate our approach, showing that our optimized model consistently outperforms state-of-the-art quantization methods. Our work provides a principled framework for the co-design of quantization and dynamic memory management, offering a practical path toward deploying powerful video segmentation models in real-world applications.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuzhe Duan , Xuanxuan Ren , Guizhe Dong , Xu Yang , Yanhua Yang

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Neural Networks Computer Vision > Processing > Image Segmentation

Keywords

semantic segmentation video segmentation model architecture deep learning memory management mixed precision quantization

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026