2026 WACV WACV 2026

Pretraining Helps When Capacity Allows: Evidence from Ultra-Small ConvNets

Abstract

Robust visual recognition on embedded platforms requires models that both generalize out-of-distribution (OOD) and fit into tiny compute/memory budgets. While pre-training is a standard route to robustness for mid/large backbones, its value in the ultra-small regime remains unclear. We present a capacity-aware study of pre-training for two efficient ConvNet families (EfficientNet and MobileNetV3) scaled from "small" to "ultra-small" via a simple, reproducible recipe. We compare three initializations -- ImageNet->COCO pretraining, ImageNet classification pretraining, and training from scratch--on two axes of distribution shift: (i) cross-dataset RGB-RGB transfer between LLVIP and FLIR (ii) cross-modality detection where models are fine-tuned on RGB and evaluated on infrared (IR). A complementary classification study on DomainNet probes whether the trends extend beyond detection. Across settings, we find that pretraining's benefit is conditional on both backbone capacity and shift difficulty. Task-aligned Imagenet->COCO pretraining is the most reliable starting point at moderate sizes and for the easier transfer direction. In the low-capacity regimes, differences are typically within run-to-run variation, and training from scratch can match or surpass pre-training. Classification mirrors this capacity gating. Our results test the premise "pretraining always helps" and instead quantify when task-aligned pretraining pays off for ultra-small backbones and when it likely does not(The code will be available online after acceptance.).

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio