Counting Stacked Objects

Corentin Dumery; Noa Etté; Aoxiang Fan; Ren Li; Jingyi Xu; Hieu Le; Pascal Fua

2025 ICCV ICCV 2025

Counting Stacked Objects

Abstract

Visual object counting is a fundamental computer vision task underpinning numerous real-world applications, from cell counting in biomedicine to traffic and wildlife monitoring. However, existing methods struggle to handle the challenge of stacked 3D objects in which most objects are hidden by those above them. To address this important yet underexplored problem, we propose a novel 3D counting approach that decomposes the task into two complementary subproblems - estimating the 3D geometry of the object stack and the occupancy ratio from multi-view images. By combining geometric reconstruction and deep learning-based depth analysis, our method can accurately count identical objects within containers, even when they are irregularly stacked. We validate our 3D Counting pipeline on large-scale synthetic and diverse real-world datasets with manually verified total counts.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — depth analysis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Corentin Dumery , Noa Etté , Aoxiang Fan , Ren Li , Jingyi Xu , Hieu Le , Pascal Fua

Topics

Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Object Detection Deep Learning > Learning Types > Multi-Modal Learning

Keywords

3d reconstruction geometry estimation depth estimation object counting multi-view image multi-view geometry visual object detection depth analysis

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025