Adaptive Compute Efficient Learning via Conceptual-Criticality (Student Abstract)
Abstract
Abstract The computational cost of large language models (LLMs) is a primary obstacle to sustainable deployment. Static resource allocation is inefficient, as not all inputs require the same depth of processing. We propose a framework for adaptive, compute-efficient learning via conceptual criticality, which dynamically tailors computation to the assessed difficulty of an input. A lightweight criticality prediction module es- timates conceptual complexity on a continuous scale, and this score governs the LLM’s inference pathway, selectively activating token pruning, layer skipping, and quantization. Simple inputs are processed with minimal FLOPs and la- tency, while complex inputs use the model’s full capacity to preserve accuracy. We benchmark our framework and in- troduce metrics to quantify sensitivity to input criticality and per-sample computational savings. Results demonstrate an improved accuracy-efficiency trade-off, paving the way for more resource-aware systems.