Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction

Luoxi Zhang; Pragyan Shrestha; Yu Zhou; Chun Xie; Itaru Kitahara

2025 ICCV ICCV 2025

Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction

Abstract

Single-view 3D reconstruction aims to recover the complete 3D geometry and appearance of objects from a single RGB image. Due to incomplete image information and ambiguity, this task remains challenging. Existing methods struggle with the trade-off between local detail and global topology, and with interference from early RGB-depth fusion in signed distance function optimization. To address these challenges, we propose Dual-S3D, a novel framework for single-view 3D reconstruction. Our method employs a hierarchical dual-path feature extraction strategy based on stages that utilize convolutional neural networks to anchor local geometric details. In contrast, subsequent stages leverage a Transformer integrated with selective state-space model to capture global topology, enhancing scene understanding and feature representation. Additionally, we design an auxiliary branch that progressively fuses precomputed depth features with pixel-level features to decouple visual and geometric cues effectively. Extensive experiments on the 3D-FRONT and Pix3D datasets demonstrate that our approach significantly outperforms existing methods--reducing chamfer distance by 51%, increasing F-score by 33.6%, and improving normal consistency by 10.3%--thus achieving state-of-the-art reconstruction quality.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — global topology

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Luoxi Zhang , Pragyan Shrestha , Yu Zhou , Chun Xie , Itaru Kitahara

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Transformers Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > 3D Vision

Keywords

state-space model convolutional neural network signed distance function single-view 3d reconstruction local detail hierarchical feature extraction implicit reconstruction global topology

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025