2026 WACV WACV 2026

From Few-Shot to Zero-Shot Pallet Load Recognition: A Deployed Embedding-Based Vision System for Industrial Logistics

Abstract

Automated pallet load recognition is a critical task in industrial logistics, but the deployment of conventional deep learning systems is often unfeasible. Their reliance on large, manually annotated datasets creates a prohibitive bottleneck in terms of cost and time, especially in dynamic environments where product lines frequently change. To overcome this challenge, we introduce a highly flexible, dual-mode vision system built upon dense patch embeddings. Our primary, few-shot approach leverages features from the CAPI vision model to construct a compact memory bank from as little as a single labeled example per class. Classification is then performed via a simple yet highly effective k-nearest neighbor search. For annotation-free scenarios, we also propose a zero-shot mode that identifies the load by finding the rectangular region that minimizes intra-class feature variance. We demonstrate state-of-the-art performance on a new, challenging industrial dataset, where our few-shot method attains a mAP_ 50-95 over 90% with only one support image per class. Additionally, the fully unsupervised approach achieves a notable mAP_ 50-95 of up to 75%. The system's robustness and practical value were validated through its successful deployment in high-stakes, real-world scenarios. Our findings establish a basis for lightweight solutions that support the rapid, data-efficient integration of new vision systems into industrial workflows. To facilitate reproducibility, the source code is released at \normalsizehttps://github.com/juanjesus-ldo/F2ZLR .

🌉 Interdisciplinary Bridge — Computer Science and Machine Learning
🧭 Keyword Pioneer — pallet recognition
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio